[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111367926A - Data processing method and device for distributed system - Google Patents

Data processing method and device for distributed system Download PDF

Info

Publication number
CN111367926A
CN111367926A CN202010125232.5A CN202010125232A CN111367926A CN 111367926 A CN111367926 A CN 111367926A CN 202010125232 A CN202010125232 A CN 202010125232A CN 111367926 A CN111367926 A CN 111367926A
Authority
CN
China
Prior art keywords
data
check
read
stored
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010125232.5A
Other languages
Chinese (zh)
Inventor
肖永玲
赵岩
刘名欣
张旭明
王豪迈
胥昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xsky Beijing Data Technology Corp ltd
Original Assignee
Xsky Beijing Data Technology Corp ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xsky Beijing Data Technology Corp ltd filed Critical Xsky Beijing Data Technology Corp ltd
Priority to CN202010125232.5A priority Critical patent/CN111367926A/en
Publication of CN111367926A publication Critical patent/CN111367926A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method and device of a distributed system. Wherein, the method comprises the following steps: receiving a data storage instruction; acquiring data to be stored, and segmenting the data to be stored according to a specified size; and respectively calculating the check data of the data to be stored in each segment, and storing the check data of the data to be stored in each segment as a check record. The invention solves the technical problem that the distributed storage system occupies too much system performance when checking the data in the prior art.

Description

Data processing method and device for distributed system
Technical Field
The invention relates to the field of data storage, in particular to a data processing method and device of a distributed system.
Background
In a distributed storage system, data consistency is very important, and silent errors, hardware exceptions or software bugs of a disk can cause user data loss or inconsistency, so that huge loss is brought to users. Therefore, in the distributed storage system, data check is carried out through MD5 or CRC, the size of each data block is recorded in the storage system, and when data is read, consistency of duplicate data is judged by calculating the checksum of the data and the stored checksum
The data verification device in the prior art needs to perform data verification when reading data from the distributed storage system, and this process has a large impact on the system performance, which is a defect existing in the current distributed storage system.
Aiming at the problem that a distributed storage system in the prior art occupies too much system performance when checking data, an effective solution is not provided at present.
Disclosure of Invention
The embodiment of the invention provides a data processing method and a data processing device for a distributed system, which are used for at least solving the technical problem that the distributed storage system occupies too much system performance when checking data in the prior art.
According to an aspect of an embodiment of the present invention, there is provided a data processing method for a distributed system, including: receiving a data storage instruction; acquiring data to be stored, and segmenting the data to be stored according to a specified size; and respectively calculating the check data of the data to be stored in each segment, and storing the check data of the data to be stored in each segment as a check record.
Further, the method further comprises: receiving a data reading instruction; reading data according to the data reading instruction, and calculating the current verification data of the read data; comparing the current verification data with original verification data of the read data when the read data is written; and determining the read data to be the data to be read indicated by the data reading instruction under the condition of successful comparison.
Further, the method further comprises: receiving a data zero writing instruction or a data clearing instruction; and clearing the zero data to be written or the check data corresponding to the data to be cleared, and updating the check data of the section where the zero data to be written or the data to be cleared is located.
Further, a preset check data cache space stores check data, and before comparing the current check data with original check data of the read data when the read data is written in, the method further includes: and extracting the original check data from the check data cache space.
Further, extracting the original check data from the check data cache space includes: if the original check data exists in the check data cache space, extracting the original check data from the check data cache space; and if the original check data does not exist in the check data cache space, extracting the original check data from the disk, and writing the original check data extracted from the disk into the check data cache space.
Further, comparing the current verification data with the original verification data of the read data when the read data is written in includes: calling an asynchronous read-back module, wherein the asynchronous read-back module finishes the current thread after submitting a read request so as to support asynchronous reading; and comparing the current check data with original check data of the read data when the read data is written in through the asynchronous read back module.
According to an aspect of an embodiment of the present invention, there is provided a data processing apparatus for a distributed system, including: the receiving module is used for receiving a data storage instruction; the device comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring data to be stored and segmenting the data to be stored according to a specified size; the first calculation module is used for calculating the check data of the data to be stored in each segment respectively and storing the check data of the data to be stored in each segment as a check record.
Further, the above apparatus further comprises: the receiving module is used for receiving a data reading instruction; the second calculation module is used for reading data according to the data reading instruction and calculating the current verification data of the read data; the comparison module is used for comparing the current check data with original check data of the read data when the read data is written in; and the determining module is used for determining the read data as the data to be read indicated by the data reading instruction under the condition of successful comparison.
According to an aspect of the embodiments of the present invention, there is provided a storage medium including a stored program, wherein when the program runs, a device on which the storage medium is located is controlled to execute the data processing method of the distributed system described above.
According to an aspect of the embodiments of the present invention, there is provided a processor configured to execute a program, where the program executes a data processing method of the distributed system described above.
In the embodiment of the invention, a data storage instruction is received, data to be stored is obtained, and the data to be stored is segmented according to the specified size; and respectively calculating the check data of the data to be stored in each segment, and storing the check data of the data to be stored in each segment as a check record. In the distributed storage system based on ceph, the minimum IO size of each operation is 4KiB, 4 bytes of data check value are obtained by coding the data of each 4KiB through the crc32 algorithm, if the data check value of an object is used as a check data record, the whole object needs to be operated for updating every 4K IO, but the segmented mode is adopted, only the check data record of the segment where the operation object data is located is loaded, updated or cleared each time, and then the corresponding rocksDB record is durably maintained or deleted, so that the influence of starting data check on the performance is reduced, and the technical problem that the distributed storage system occupies too much system performance when the distributed storage system checks the data in the prior art is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a method of data processing for a distributed system according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a segmented storage of parity data according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of storing parity data based on a crc cache according to an embodiment of the present invention;
FIG. 4 is a flow chart of reading data according to an embodiment of the present invention; and
fig. 5 is a schematic diagram of a data processing apparatus of a distributed system according to an embodiment of the present invention.
In the following, in order to facilitate understanding of the present embodiment, terms appearing in the present embodiment are explained:
distributed storage system: in short, data is stored in a plurality of storage servers in a distributed manner, and the distributed storage resources form a virtual storage device to provide data storage services.
PG: placement Group (place Group), a Placement Group (PG) aggregates a series of objects into a Group and maps this Group to a series of OSDs.
OSD: OSD (object Storage device) refers to a process responsible for data dropping.
Silent error: the most core mission of the Silent Data Corruption is to correctly store Data and correctly read Data, and an abnormal alarm is timely thrown out when an error occurs. Disk exceptions may include hardware errors, firmware or software BUGs, power supply problems, media corruption, etc., which are all conventionally captured to normally throw the exception, and most feasibly, data processing is normal until the time of use, where the data is found to be erroneous, corrupted. This is the silence error.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
In accordance with an embodiment of the present invention, there is provided an embodiment of a data processing method for a distributed system, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a flowchart of a data processing method of a distributed system according to an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:
step S102, receiving a data storage instruction.
Specifically, the data storage instruction is used for storing data into the distributed storage system.
And step S104, acquiring data to be stored, and segmenting the data to be stored according to the specified size.
In ceph's distributed storage system, the size of each object is 4 MiB. In the above scheme, each object is segmented, so as to obtain a plurality of segments corresponding to one object. In an alternative embodiment, for a 4G object, it may be segmented, e.g. if segmented according to 256KiB, then an object has 16 segments.
Step S106, respectively calculating the check data of the data to be stored in each segment, and storing the check data of the data to be stored in each segment as a check record.
Specifically, the Check data is a Check value of the data, and the Check data of the data to be stored in each segment may be determined in an MD5 (message digest algorithm 5) or a Cyclic Redundancy Check (CRC) manner. In the above embodiment, if segmentation is performed according to 256KiB, one object has 16 pieces of inspection data records.
Fig. 2 is a schematic diagram of storing verification data in segments according to an embodiment of the present invention, and in conjunction with fig. 2, for a 4MiB object, the object is divided into 16 segments, where each segment data _ chunk _ size is 256 KiB. And coding every 4KiBd data through a crc32 algorithm to obtain a 4byte data check value, wherein the data check value of every 256KiB is recorded as a < key, value > cre _ chunk (IO of the 4KiB has a check value of 4 bytes, IO of the 256KiB data has 64 check values of 4 bytes, namely 256 bytes, and the 256 check values of the 256 bytes are stored as one piece in the database), wherein key is used for representing the identification of the segment, and velue is used for representing the data check value. The records are stored in the backend RocksDB, resulting in 16 check data records, each check data record crc _ chunk _ size being 256B, and the sum of the data check records being 4KiB crcvalue.
As can be seen from the above, in the embodiment of the present application, a data storage instruction is received, data to be stored is obtained, and the data to be stored is segmented according to a specified size; and respectively calculating the check data of the data to be stored in each segment, and storing the check data of the data to be stored in each segment as a check record. In the distributed storage system based on ceph, the minimum IO size of each operation is 4KiB, 4 bytes of data check value are obtained by coding the data of each 4KiB through the crc32 algorithm, if the data check value of an object is used as a check data record, the whole object needs to be operated for updating every 4K IO, but the segmented mode is adopted, only the check data record of the segment where the operation object data is located is loaded, updated or cleared each time, and then the corresponding rocksDB record is durably maintained or deleted, so that the influence of starting data check on the performance is reduced, and the technical problem that the distributed storage system occupies too much system performance when the distributed storage system checks the data in the prior art is solved.
After data verification is started, it is required to ensure that data is consistent with existing data verification values after four operations, namely write, read, zero, and truncate, are executed. Write operation (write) of data that must compute its check data, update the check data record of the segment to which the data belongs, and persist (store in RocksDB) the check data to maintain consistency of the data with the check value. The other three operations are separately described.
As an alternative embodiment, the method further includes: receiving a data reading instruction; reading data according to the data reading instruction, and calculating the current verification data of the read data; comparing the current verification data with original verification data of the read data when the read data is written;
and determining the read data to be the data to be read indicated by the data reading instruction under the condition of successful comparison.
In the above scheme, data verification is required for a read operation (read). The data verification method comprises the steps of calculating current verification data of data read according to a data reading instruction, extracting original verification data when the data are stored, comparing the current verification data with the original verification data, if the current verification data are the same as the original verification data, indicating that the read data are not tampered, and if the current verification data are different from the original verification data, indicating that the read data are possibly tampered, so that the consistency of data writing and reading is ensured.
As an alternative embodiment, the method further includes: receiving a data zero writing instruction or a data clearing instruction; and clearing the zero data to be written or the check data corresponding to the data to be cleared, and updating the check data of the section where the zero data to be written or the data to be cleared is located.
In the above scheme, when performing a data zero write operation (zero) or a data clear operation (truncate), the corresponding data check value needs to be cleared. And clearing the data check value of the data of the zero writing operation or the clearing operation, wherein the check record of the segment where the data is located also needs to be updated.
As an optional embodiment, a preset check data cache space stores check data, and before comparing the current check data with original check data of the read data when the read data is written, the method further includes: and extracting the original check data from the check data cache space.
The check data cache space (crc cache) adopts a cache mechanism to store the check data in the rocksDB in the check data cache space, so that the original check data can be directly read from the check data cache space without being extracted from the rear-end rocksDB.
In an optional embodiment, the check data cache space includes multiple levels (levels), and the searching manner is to search at the highest level first, and if the check data cache space does not exist, search at the next level until the last level of the check data cache space is found. Fig. 3 is a schematic diagram of storing verification data based on a crc cache according to an embodiment of the present invention, and as shown in fig. 3, a specified storage space XStore includes multiple cache segments, each cache segment has 8 layers of crc caches (level #1of crc cache, level #1of crc cache … … level #8of crc cache), which cover all data in pg.a to pg.z. When searching is carried out, firstly searching is carried out from the level #1of the crc cache, if the searched check data does not exist in the level #1of the crc cache, searching is carried out from the level #2of the crc cache until the searched check data is found, or the level #8of the crc cache is found, wherein in the example, one level is a 128M memory, and the 128M memory can cache the data check value of 128G data.
As an alternative embodiment, the extracting the original check data from the check data cache space includes: if the original check data exists in the check data cache space, extracting the original check data from the check data cache space; and if the original check data does not exist in the check data cache space, extracting the original check data from the disk, and writing the original check data extracted from the disk into the check data cache space.
In the scheme, after the crc cache is started, the crc cache is used for caching a check value, when data is read, a check sum of the data is calculated, and then the check sum is compared with the check value of the data in the cache, the data check value is compared in the cache, and compared with the data read from the rocksDB, the response speed is higher. If there is no hit in the crc cache, it is read from rocksDB and then cached in the crc cache.
Fig. 4 is a flow chart of reading data according to an embodiment of the present invention, first performing a conditional check: and judging whether l2_ cache is opened or not, wherein the l2_ cache refers to a second layer of cache in an xstore (a ceph local storage engine, similar to a file system and a component directly interacting with a block device), and the layer of cache comprises two types of caches with important data structures, namely an onode and a crc. The onode cache records which objects have the onode cache, and the onode contains all metadata information of the objects, such as information of physical offset, extended attribute, object size and the like of data blocks used by the objects, and is similar to an inode of a file system; the crc cache keeps track of which objects own the crc cache and which sections of the object contain the data check value and the data check value itself. Opening l2_ cache means that the onode and the crc cache are enabled, and after the cache is enabled, the onode and the crc cache need to be updated when the object read-write operation is involved. And entering the next step under the condition that the judgment result is open, and directly exiting under the condition that the judgment result is not open.
And (4) carrying out validity check under the condition that the l2_ cache is opened as a judgment result, if the result is meta _ pg, a temporary object, a non-block object and the like, not needing to be cached, and exiting, otherwise, entering the next step.
And judging whether the only cache is satisfied (the only cache is updated to the crc cache), if so, not caching the onode (the only cache is updated to the crc cache to avoid repeated updating and improve the operation efficiency), otherwise, caching the onode, and entering the next step.
The condition check is performed again. And if the data verification is not recorded and started in the onode, exiting, if the switch of the crc cache is not started, exiting, otherwise, entering the next step under other conditions. Specifically, l2_ cache includes a crc cache, but the fact that l2_ cache is open does not mean that the crc cache is open, but that the crc cache is open means that l2_ cache is definitely open. Therefore, it is necessary to determine whether the crc cache is opened, and if the object itself does not open the data check function (record the attribute in the onode), it is not necessary to update the crc cache.
And searching whether the object is cached in the crc cache, if the object is not cached, writing all record objects in the crc _ chunk _ map (used for recording all data check values stored in the current running time of the object) into the crc cache, otherwise, only writing the segment data check values of which the internal records are updated, marked to be deleted or read from the rocksDB into the cec cache. Specifically, the object is a basic operation unit of ceph, and has metadata information (onode) and data information (actually records the content written by the user) similar to a file in a file system. The capacity of the crc cache has an upper limit, and the crc cache cannot contain crc caches of all objects in the ceph cluster, so that before updating the crc cache, whether the object is in the crc cache needs to be judged first, and if the object is in the crc cache, a changed part in the operation is written into the cache; if not, all data check values (recorded in the crc _ chunk _ map) saved by the current runtime of the object are written into the crc cache. The sections that are updated, marked for deletion, or read from the rocksDB are part of the changes. The updated value means that the old value is stored in the crc cache, and the old value needs to be covered by the new value of the current operation, for example, the crc value is changed when a certain part of the written object is covered; marking as deleted means that this data segment is deleted, so the data check value in the crc cache needs to be cleared; the segmented data read from the rocksDB means that the crc buffer does not have the data check value, and the crc buffer needs to be added with the data check value.
As an optional embodiment, comparing the current check data with original check data of the read data when the read data is written includes: calling an asynchronous read-back module, wherein the asynchronous read-back module finishes the current thread after submitting a read request so as to support asynchronous reading; and comparing the current check data with original check data of the read data when the read data is written in through the asynchronous read back module.
Reading of the distributed storage system based on ceph is divided into synchronous reading and asynchronous reading, namely semantics of a read request issued by PG, and if the reading is synchronous reading, a worker thread of osd is always blocked until data is returned from a disk or a cache. Thus, when the disk is used as an IO, the thread cannot schedule other IO, and the performance is poor, so that the asynchronous reading is more used, and only a small amount of service is synchronous reading, such as a data reading request of scrub.
However, if data checking is enabled, data crc needs to be checked during reading, and since internal force waits for a read return, the thread always holds the pg lock at this time, so that other io inside the pg cannot be processed, and asynchronous reading is internally degraded into synchronous reading.
In order to solve the problems, data verification is added in an asynchronous read callback module, the asynchronous read callback module is used for performing data verification after asynchronous read operation returns from a bottom layer, and if data inconsistency exists, the object is recorded. The data is checked and put into the asynchronous read callback module and executed by the asynchronous read callback module, the thread is executed after the semantic meaning of asynchronous read is submitted to the cache, and meanwhile, the pg lock is released to support the asynchronous read, so that the influence on the performance is greatly reduced.
Example 2
According to an embodiment of the present invention, an embodiment of a data processing apparatus of a distributed system is provided, and fig. 5 is a schematic diagram of the data processing apparatus of the distributed system according to the embodiment of the present invention, as shown in fig. 5, the apparatus includes the following steps:
the receiving module 50 is configured to receive a data storage instruction.
The obtaining module 52 is configured to obtain data to be stored, and segment the data to be stored according to a specified size.
The first calculating module 54 is configured to calculate the check data of the data to be stored in each segment, and store the check data of the data to be stored in each segment as a check record.
As an alternative embodiment, the apparatus further comprises: the receiving module is used for receiving a data reading instruction; the second calculation module is used for reading data according to the data reading instruction and calculating the current verification data of the read data; the comparison module is used for comparing the current check data with original check data of the read data when the read data is written in; and the determining module is used for determining the read data as the data to be read indicated by the data reading instruction under the condition of successful comparison.
As an alternative embodiment, the apparatus further comprises: the second receiving module is used for receiving a data zero writing instruction or a data clearing instruction; and the clearing module is used for clearing the zero data to be written or the check data corresponding to the data to be cleared and updating the check data of the section where the zero data to be written or the data to be cleared is located.
As an optional embodiment, the apparatus further includes: and the extraction module is used for storing the check data in a preset check data cache space, and extracting the original check data from the check data cache space before comparing the current check data with the original check data of the read data when the read data is written.
As an alternative embodiment, the extraction module comprises: a first extraction submodule, configured to extract the original check data from the check data cache space if the original check data exists in the check data cache space; and the second extraction submodule is used for extracting the original check data from the disk and writing the original check data extracted from the disk into the check data cache space if the original check data does not exist in the check data cache space.
As an optional embodiment, the alignment module includes: the calling sub-module is used for calling the asynchronous read-back module, wherein the asynchronous read-back module finishes the current thread after submitting a read request so as to support asynchronous reading; and the comparison submodule is used for comparing the current check data with the original check data of the read data when the read data is written through the asynchronous read back module.
Example 3
According to an embodiment of the present invention, a storage medium is provided, where the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to execute the data processing method of the distributed system according to embodiment 1.
Example 4
According to an embodiment of the present invention, a processor is provided, and the processor is configured to execute a program, where the program executes the data processing method of the distributed system according to embodiment 1 during execution.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A data processing method for a distributed system, comprising:
receiving a data storage instruction;
acquiring data to be stored, and segmenting the data to be stored according to a specified size;
and respectively calculating the check data of the data to be stored in each segment, and storing the check data of the data to be stored in each segment as a check record.
2. The method of claim 1, further comprising:
receiving a data reading instruction;
reading data according to the data reading instruction, and calculating the current verification data of the read data;
comparing the current verification data with original verification data of the read data when the read data is written;
and determining the read data to be the data to be read indicated by the data reading instruction under the condition of successful comparison.
3. The method of claim 1, further comprising:
receiving a data zero writing instruction or a data clearing instruction;
and clearing the zero data to be written or the check data corresponding to the data to be cleared, and updating the check data of the section where the zero data to be written or the data to be cleared is located.
4. The method according to claim 2, wherein a predetermined check data buffer space stores check data, and before comparing the current check data with original check data of the read data when the read data is written, the method further comprises: and extracting the original check data from the check data cache space.
5. The method of claim 4, wherein extracting the original parity data from the parity data cache space comprises:
if the original check data exists in the check data cache space, extracting the original check data from the check data cache space;
and if the original check data does not exist in the check data cache space, extracting the original check data from the disk, and writing the original check data extracted from the disk into the check data cache space.
6. The method of claim 2, wherein comparing the current parity data with original parity data of the read data when the read data was written comprises:
calling an asynchronous read-back module, wherein the asynchronous read-back module finishes the current thread after submitting a read request so as to support asynchronous reading;
and comparing the current check data with original check data of the read data when the read data is written in through the asynchronous read back module.
7. A data processing apparatus for a distributed system, comprising:
the receiving module is used for receiving a data storage instruction;
the device comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring data to be stored and segmenting the data to be stored according to a specified size;
the first calculation module is used for calculating the check data of the data to be stored in each segment respectively and storing the check data of the data to be stored in each segment as a check record.
8. The apparatus of claim 7, further comprising:
the receiving module is used for receiving a data reading instruction;
the second calculation module is used for reading data according to the data reading instruction and calculating the current verification data of the read data;
the comparison module is used for comparing the current check data with original check data of the read data when the read data is written in;
and the determining module is used for determining the read data as the data to be read indicated by the data reading instruction under the condition of successful comparison.
9. A storage medium, characterized in that the storage medium includes a stored program, wherein, when the program runs, a device in which the storage medium is located is controlled to execute the data processing method of the distributed system according to any one of claims 1 to 6.
10. A processor, characterized in that the processor is configured to execute a program, wherein the program executes a data processing method of a distributed system according to any one of claims 1 to 6.
CN202010125232.5A 2020-02-27 2020-02-27 Data processing method and device for distributed system Pending CN111367926A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010125232.5A CN111367926A (en) 2020-02-27 2020-02-27 Data processing method and device for distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010125232.5A CN111367926A (en) 2020-02-27 2020-02-27 Data processing method and device for distributed system

Publications (1)

Publication Number Publication Date
CN111367926A true CN111367926A (en) 2020-07-03

Family

ID=71212190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010125232.5A Pending CN111367926A (en) 2020-02-27 2020-02-27 Data processing method and device for distributed system

Country Status (1)

Country Link
CN (1) CN111367926A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112000487A (en) * 2020-08-14 2020-11-27 浪潮电子信息产业股份有限公司 Scrub pressure adjusting method, device and medium
CN114138191A (en) * 2021-11-21 2022-03-04 苏州浪潮智能科技有限公司 Storage pool data verification method, system, device and medium
CN115905114A (en) * 2023-03-09 2023-04-04 浪潮电子信息产业股份有限公司 Batch updating method and system of metadata, electronic equipment and readable storage medium
CN116737457A (en) * 2023-06-16 2023-09-12 深圳市青葡萄科技有限公司 Data verification method based on distributed storage

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458638A (en) * 2007-12-13 2009-06-17 安凯(广州)软件技术有限公司 Large scale data verification method for embedded system
US20110055277A1 (en) * 2009-08-27 2011-03-03 Cleversafe, Inc. Updating dispersed storage network access control information
CN102419766A (en) * 2011-11-01 2012-04-18 西安电子科技大学 Data redundancy and file operation method based on HDFS distributed file system
CN103646082A (en) * 2013-12-12 2014-03-19 北京奇虎科技有限公司 Method and device for checking files
CN103731451A (en) * 2012-10-12 2014-04-16 腾讯科技(深圳)有限公司 Method and system for uploading file
CN104239234A (en) * 2014-10-13 2014-12-24 合一网络技术(北京)有限公司 High-efficiency local cache management and reading method
CN105187551A (en) * 2015-09-29 2015-12-23 成都四象联创科技有限公司 Distributed computing method based on cloud platform
CN106201338A (en) * 2016-06-28 2016-12-07 华为技术有限公司 Date storage method and device
CN109558752A (en) * 2018-11-06 2019-04-02 北京威努特技术有限公司 Method for quickly realizing file identification under host white list mechanism

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458638A (en) * 2007-12-13 2009-06-17 安凯(广州)软件技术有限公司 Large scale data verification method for embedded system
US20110055277A1 (en) * 2009-08-27 2011-03-03 Cleversafe, Inc. Updating dispersed storage network access control information
CN102419766A (en) * 2011-11-01 2012-04-18 西安电子科技大学 Data redundancy and file operation method based on HDFS distributed file system
CN103731451A (en) * 2012-10-12 2014-04-16 腾讯科技(深圳)有限公司 Method and system for uploading file
CN103646082A (en) * 2013-12-12 2014-03-19 北京奇虎科技有限公司 Method and device for checking files
CN104239234A (en) * 2014-10-13 2014-12-24 合一网络技术(北京)有限公司 High-efficiency local cache management and reading method
CN105187551A (en) * 2015-09-29 2015-12-23 成都四象联创科技有限公司 Distributed computing method based on cloud platform
CN106201338A (en) * 2016-06-28 2016-12-07 华为技术有限公司 Date storage method and device
CN109558752A (en) * 2018-11-06 2019-04-02 北京威努特技术有限公司 Method for quickly realizing file identification under host white list mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
石红芹 等: "《操作系统》", 30 September 2017, 吉林大学出版社 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112000487A (en) * 2020-08-14 2020-11-27 浪潮电子信息产业股份有限公司 Scrub pressure adjusting method, device and medium
CN112000487B (en) * 2020-08-14 2022-07-08 浪潮电子信息产业股份有限公司 Scrub pressure adjusting method, device and medium
CN114138191A (en) * 2021-11-21 2022-03-04 苏州浪潮智能科技有限公司 Storage pool data verification method, system, device and medium
CN114138191B (en) * 2021-11-21 2023-09-01 苏州浪潮智能科技有限公司 Storage pool data verification method, system, equipment and medium
CN115905114A (en) * 2023-03-09 2023-04-04 浪潮电子信息产业股份有限公司 Batch updating method and system of metadata, electronic equipment and readable storage medium
CN116737457A (en) * 2023-06-16 2023-09-12 深圳市青葡萄科技有限公司 Data verification method based on distributed storage
CN116737457B (en) * 2023-06-16 2024-03-08 深圳市青葡萄科技有限公司 Data verification method based on distributed storage

Similar Documents

Publication Publication Date Title
CN111367926A (en) Data processing method and device for distributed system
CN110531940B (en) Video file processing method and device
CN108319602B (en) Database management method and database system
US7882386B1 (en) System and method for recovering a logical volume during failover or reboot of a file server in a data storage environment
US10248556B2 (en) Forward-only paged data storage management where virtual cursor moves in only one direction from header of a session to data field of the session
US8600939B2 (en) Writable snapshots
CN103019890B (en) Block-level disk data protection system and method thereof
US7487385B2 (en) Apparatus and method for recovering destroyed data volumes
CN108021717B (en) Method for implementing lightweight embedded file system
US11983438B2 (en) Technique for improving operations log indexing
CN109947730B (en) Metadata recovery method, device, distributed file system and readable storage medium
CN112307022A (en) Metadata repairing method and related device
WO2022105442A1 (en) Erasure code-based data reconstruction method and appratus, device, and storage medium
CN113885809A (en) Data management system and method
CN111610936B (en) Object storage platform, object aggregation method and device and server
CN116483284B (en) Method, device, medium and electronic equipment for reading and writing virtual hard disk
CN105653385B (en) A kind of vehicle-mounted kinescope method
CN117271221A (en) Database data recovery method, storage medium and device
CN117076191A (en) File metadata backup method on HDD disk and metadata backup server
CN113625952B (en) Object storage method, device, equipment and storage medium
CN113553215B (en) Erasure code data recovery optimization method and device based on environment information
CN111399774B (en) Data processing method and device based on snapshot under distributed storage system
JPS62245348A (en) Method and device for updating data base
CN112131194A (en) File storage control method and device of read-only file system and storage medium
CN107506156B (en) Io optimization method of block device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200703

RJ01 Rejection of invention patent application after publication