CN112632621B

CN112632621B - Data access method, device, equipment and computer storage medium

Info

Publication number: CN112632621B
Application number: CN202011629123.3A
Authority: CN
Inventors: 李睿; 田苗; 陈劼; 王娟; 苏士伟
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Jiangsu Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Jiangsu Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2024-08-27
Anticipated expiration: 2040-12-30
Also published as: CN112632621A

Abstract

The data access method, the device, the equipment and the computer storage medium of the embodiment of the invention can determine the first rack from at least one rack of the first data center according to the appointed algorithm, determine the first computing node from at least one computing node of the first rack according to the appointed algorithm, store the data block to be stored to the first computing node, determine at least one second rack from at least one rack of the first data center according to the appointed algorithm, determine the second computing node from at least one rack of the at least one second rack according to the appointed algorithm respectively, and the second computing node is used for storing copy data of the data block. The security of data access can be improved.

Description

Data access method, device, equipment and computer storage medium

Technical Field

The present invention relates to the field of system implementation, and in particular, to a data access method, apparatus, device, and computer storage medium.

Background

In order to improve the storage rate and reduce the network overhead, the existing data distributed environment multi-copy storage stores copies of data in different nodes of the same rack, but the storage method can cause that all stored data cannot be accessed even cause data damage or loss under the condition that the rack is abnormal, and has potential safety hazards of data access.

Disclosure of Invention

The embodiment of the invention provides a data access method, a data access device, data access equipment and a computer storage medium, which can improve the security of data access.

In a first aspect, an embodiment of the present invention provides a data access method, including:

determining a first rack from at least one rack of the first data center according to a specified algorithm;

Determining a first computing node from at least one computing node of the first rack according to a specified algorithm;

Storing the data block to be stored to a first computing node;

determining at least one second rack from the at least one rack of the first data center according to a specified algorithm;

And determining second computing nodes from at least one rack of the at least one second rack according to a specified algorithm, respectively, the second computing nodes being for storing duplicate data of the data blocks.

In an alternative embodiment, determining a first computing node from at least one computing node of a first rack according to a specified algorithm includes:

determining a first identification from the first rack according to a specified algorithm;

the first computing node is randomly determined from at least one computing node corresponding to the first identifier.

In an alternative embodiment, the specified algorithm comprises a dynamic weight loading algorithm; determining a first rack from at least one rack of the first data center according to a specified algorithm, comprising:

And determining the first rack according to the weight of at least one rack in the first data center by adopting a dynamic weight load algorithm.

And determining the first computing node according to the weight of at least one computing node of the first rack by adopting a dynamic weight load algorithm.

In an alternative embodiment, the method further comprises:

determining whether the index statement is damaged according to the length of the index statement;

if the index statement is not damaged, inquiring the first data according to the index statement, and determining whether the first data corresponding to the index statement can be inquired;

if the first data corresponding to the index statement is queried, determining whether the first data is damaged according to the index statement;

if the first data is not damaged, determining whether the size of the first data is changed according to the index statement;

And if the size of the first data is not changed, checking the second data.

In a second aspect, an embodiment of the present invention provides a data access apparatus, including:

a determining module for determining a first rack from at least one rack of the first data center according to a specified algorithm;

the determining module is further used for determining a first computing node from at least one computing node of the first rack according to a specified algorithm;

the storage module is used for storing the data block to be stored to the first computing node;

A determination module for determining at least one second rack from the at least one racks of the first data center according to a specified algorithm;

the determining module is further configured to determine second computing nodes from at least one rack of the at least one second rack according to a specified algorithm, respectively, where the second computing nodes are configured to store duplicate data of the data blocks.

In an alternative embodiment, the determining module is specifically configured to:

In a third aspect, there is provided a data access apparatus, the apparatus comprising: a memory for storing a program; a processor for executing a program stored in a memory for performing the data access method of the first aspect or any optional implementation of the first aspect.

In a fourth aspect, there is provided a computer storage medium having stored thereon computer program instructions which, when executed by a processor, implement the data access method provided by the first aspect or any of the alternative embodiments of the first aspect.

Drawings

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are needed to be used in the embodiments of the present invention will be briefly described, and it is possible for a person skilled in the art to obtain other drawings according to these drawings without inventive effort.

FIG. 1 is a schematic diagram of the overall structure of a storage system according to an embodiment of the present invention;

FIG. 2 is a flow chart of a data access method according to an embodiment of the invention;

FIG. 3 is a flow chart of a data access method according to another embodiment of the present invention;

FIG. 4 is a schematic diagram of an index statement format provided by one embodiment of the invention;

FIG. 5 is a schematic diagram of a data access device according to another embodiment of the present invention;

fig. 6 is a block diagram of an exemplary hardware architecture of a web content tagging device in an embodiment of the invention.

Detailed Description

Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and the detailed embodiments. It should be understood that the particular embodiments described herein are meant to be illustrative of the invention only and not limiting. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the invention by showing examples of the invention.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

First, for easy understanding, the following sections of the embodiments of the present invention will specifically describe related application scenarios.

Block, minimum storage and processing unit in database.

The rack, collectively referred to as rack server, is used to secure patch panels, enclosures and equipment within telecommunications cabinets.

The data node is responsible for storing the data block and executing the read-write operation of the data block, and one rack comprises a plurality of data nodes.

The existing multi-copy storage selection server is randomly selected, and only simple ip mutually exclusive multi-copy storage realization is generally realized, so that the following technical problems need to be solved:

1. Power failure of the frame: when multiple copies of the same block are written to the same rack, if one rack is hung up, the situation that all copies of the current block are lost and cannot be recovered can occur.

2. And randomly storing a plurality of frames: when multiple storage nodes are involved in multiple racks at the same time, if no excellent writing algorithm causes multiple network traffic to rise, for example, a block has three copies, a first block of data block1 is written to a first rack, then a random selection is made to write block2 to a second rack, at this time, network traffic for data transmission is generated between the two racks, and then, under the random condition, block3 is rewritten back to the first rack again, at this time, data traffic is generated again between the two racks.

3. Data cannot be repaired when a disk fails: the distributed cache system has very strict requirements on data, the disk damage can rely on a copy strategy to enable a service system to run smoothly, but in a production environment, cold data cannot be predicted, hot data can rely on access retry to determine whether the disk fails, but the cold data cannot be detected basically. Failure to repair the data leaves a significant production hazard.

4. When the host computer fails, the data loss problem is caused: the copy storage strategy of the production cluster is a rack sensing mode, and common rack sensing is that main and standby data are stored on different racks. The production environment is a scene of high concurrency, large data volume and continuous reading and writing, and the node is quite common when stopping due to faults. If the problem of low probability occurs, node failure and disk damage occur simultaneously, so that all the cached main and standby data cannot be accessed, production data is lost, the accuracy of the service is greatly affected, and the hidden danger of production is very large.

Fig. 1 is a schematic diagram of an overall structure of a storage system according to an embodiment of the present invention, where, as shown in fig. 1, a data center includes a plurality of racks, and one rack includes a plurality of data nodes (datanodes), and as shown in fig. 1, a data center 1 (D1) includes a rack 1 (R1) and a rack 2 (R2), and the rack 1 includes 3 data nodes N1, N2, and N3.

In order to solve the problems in the prior art, the embodiment of the invention provides a data access method, a data access device, data access equipment and a computer storage medium.

The following first describes a data access method provided by an embodiment of the present invention.

Fig. 2 is a flow chart illustrating a data access method according to an embodiment of the invention. As shown in fig. 2, the method may include the steps of:

s201, determining a first rack from at least one rack of the first data center according to a specified algorithm.

In some embodiments, the specified algorithm comprises a dynamic weight loading algorithm.

In some embodiments, a dynamic weight loading algorithm is employed to determine the first rack based on the weight of at least one rack in the first data center.

In some embodiments, the data blocks in the dynamic weight loading algorithm select an optional resource pool, and the resource pool comprises a rack pool and a storage node pool. One result is that a node is selected by a random algorithm, which results in a skew in resource utilization due to the processing power of each processing node, i.e., the load of the processing nodes is unbalanced, some nodes are already overloaded, and other nodes are substantially idle. Therefore, a load mechanism must be adopted to know the load condition and the processing capacity of each processing node in real time, and adjust in real time, the nodes with low load increase the weight to process more data storage requests, and the nodes with high load reduce the processing capacity of the nodes, so that the processing capacity of the whole cluster is achieved.

In some embodiments, the calculation formula for the dynamic weights includes:

Load(N_i)＝P_cpu*L_cpu(N_i)+P_mem*L_mem(N_i)+P_io*L_io(N_i)+P_res*L_res(N_i)

Wherein, the parameter sets a proportional constant coefficient P _i to represent the importance degree of each load parameter, wherein Σp _i =1, wherein Lx (Ni) represents the load value of a certain current parameter of node Ni, and the value of the coefficient constant Px can be continuously adjusted according to the importance degree of each index when different applications are deployed.

Each storage node reports each index information of the current server to an algorithm at regular time, and the method mainly comprises CPU utilization rate, memory utilization rate, disk I/O access rate and average response time, wherein the algorithm calculates a new node weight according to each current index, and then adjusts the node processing request receiving quantity according to the current weight.

S202, determining a first computing node from at least one computing node of the first rack according to a specified algorithm.

In some embodiments, the computing nodes comprise DataNode nodes.

In some embodiments, determining the first identity from the first rack according to a specified algorithm randomly determines the first computing node from at least one computing node to which the first identity corresponds.

In some embodiments, a dynamic weight loading algorithm is employed to determine the first computing node based on the weight of at least one computing node of the first rack.

And S203, storing the data block to be stored to the first computing node.

S204, determining at least one second rack from at least one rack of the first data center according to a specified algorithm.

In some embodiments, if the local machine is a DataNode node, the first Block copy is stored on the local machine, otherwise, one DataNode machine is selected for storage in the same machine frame through a dynamic weight loading algorithm, and the second Block copy is stored on a certain DataNode node of another machine frame.

S205, determining second computing nodes from at least one rack of at least one second rack according to a specified algorithm, wherein the second computing nodes are used for storing copy data of the data blocks.

In some embodiments, if the local machine is a DataNode node, the first Block copy is stored on the local machine, otherwise, one DataNode machine is selected for storage in the same Rack as the local machine through a dynamic weight loading algorithm, the second Block copy is stored on a certain DataNode node of another Rack, and the third Block copy is stored on another DataNode node of the Rack 2.

If the number of Block copies set exceeds 3, the remaining copies will be evenly distributed in the cluster with the assurance that the copy per rack does not exceed the upper limit ((copy number-1)/rack number +2).

In some embodiments, the second Block copy is stored in a different rack, so that the hidden danger of power failure or network failure of the whole machine rack is solved, and one copy works when any rack is abnormal; and the third copy is stored on the node which is in the same rack as the second copy, so as to reduce the time cost caused by the transmission of the third copy across the racks again and save a great deal of network resources.

In the data access method of the embodiment of the invention, the first rack can be determined from at least one rack of the first data center according to the designated algorithm, the first computing node is determined from at least one computing node of the first rack according to the designated algorithm, the data block to be stored is stored to the first computing node, the at least one second rack is determined from at least one rack of the first data center according to the designated algorithm, the second computing node is determined from at least one rack of the at least one second rack according to the designated algorithm, and the second computing node is used for storing the copy data of the data block. The security of data access can be improved.

Fig. 3 is a flow chart illustrating a data access method according to another embodiment of the invention. As shown in fig. 3, the method may include the steps of:

S301, determining a first rack from at least one rack of the first data center according to a specified algorithm.

S302, determining a first computing node from at least one computing node of the first rack according to a specified algorithm.

S303, storing the data block to be stored to the first computing node.

S304, determining at least one second rack from at least one rack of the first data center according to a specified algorithm.

S305, determining second computing nodes from at least one rack of at least one second rack according to a specified algorithm, wherein the second computing nodes are used for storing copy data of the data blocks.

S306, determining whether the index statement is damaged according to the length of the index statement; if not, go to step S307; if yes, the process proceeds to step S312.

FIG. 4 is a schematic diagram of an index statement format according to an embodiment of the present invention, where the start of the index of the current data block is shown in FIG. 4, and an offset position of the lookup data is provided, and the fixed length is 4 bytes.

Commit Offset: the data block is at the disk offset address of the current storage node. The specific data block position is searched through the value, and the fixed length is 8 bytes.

Size: the size of the data block, if a data modification operation occurs, is modified at the same time with the value, and is fixed to 4 bytes.

CRC32Code: and the data block integrity check code is used for detecting whether the data is complete or not, and if the data modification operation occurs, the value is modified at the same time, and the fixed length is 4 bytes.

End: the end position of the current data block index is fixed to be 2 bytes long.

In some embodiments, the partition index file has the characteristic of fixed length of each index, and can be checked according to the mode of front and back identification bits and index size of each index during checking.

Reading 2 x 8bit data as a from the first bit, identifying a start identifier, shifting back 16 x 8bit, reading 2 x 8bit data b, and identifying an end identifier.

If the read data b is the end mark, the size and the mark of the current index are correct.

If the read data b is not the end identification, the current index is damaged, and the repair of the single index is started.

S307, inquiring the first data according to the index statement, and determining whether the first data corresponding to the index statement can be inquired; if yes, go to step S308; if not, the process advances to step S311.

In some embodiments, the node disk detection thread sequentially detects all index files on each node, the Commit Offset is an Offset pointing to a data block location in the disk, and the position of the corresponding data on the disk is found by the Commit Offset of each data block.

S308, determining whether the first data is damaged according to the index statement; if yes, go to step S311; if not, the process advances to step S309.

In some embodiments, if the disk damage may cause that the data cannot be read or the corresponding data block cannot be found at the corresponding position of the Commit Offset, then we directly determine that the copy is lost, directly perform quick repair of the data, not perform the following data integrity check operation, and if the data can be read normally, perform the next step.

S309, determining whether the size of the first data is changed according to the index statement; if yes, go to step S311; if not, the process proceeds to step S310.

In some embodiments, the size of the current data block is calculated to be dataSize =size (data), the size of the data block returned dataSize is compared with the size of the data block recorded by the index, if the data block is inconsistent, which indicates that the data block is damaged (the latest data block size is modified at the same time when the data block is modified manually), the data is quickly repaired, if the size also indicates that the current data has no problem, and the next data block is continuously checked.

S310, selecting second data and checking the second data.

In some embodiments, the method for verifying the second data is the same as the method for verifying the first data, and will not be described here again.

S311, repairing the first data.

S312, repairing the index statement according to the copy of the index statement.

In some embodiments, the node where the copy of the current partition index is located is found, and the position where the copy index is located is found by the start identification.

In some embodiments, each partition index has its own 2 backup copies in other nodes, and a single index level repair can be made through the secondary index when the primary index is corrupted.

Based on the same inventive concept, the embodiment of the invention provides a data access device. Fig. 5 is a schematic structural diagram of a data access device according to another embodiment of the present invention. As shown in fig. 5, the apparatus includes:

the determination module 501 is configured to determine a first rack from at least one rack of a first data center according to a specified algorithm.

The determining module 501 is further configured to determine a first computing node from at least one computing node of the first rack according to a specified algorithm;

The storage module 502 is configured to store a data block to be stored to a first computing node;

The determining module 501 is further configured to determine at least one second rack from the at least one rack of the first data center according to a specified algorithm;

The determining module 501 is further configured to determine second computing nodes from at least one of the at least one second rack according to a specified algorithm, respectively, the second computing nodes being configured to store duplicate data of the data block.

The determining module 501 is specifically configured to: and determining a first identification from the first rack according to a specified algorithm, and randomly determining a first computing node from at least one computing node corresponding to the first identification.

The determining module 501 is specifically configured to: and determining the first rack according to the weight of at least one rack in the first data center by adopting a dynamic weight load algorithm.

The determining module 501 is specifically configured to: and determining the first computing node according to the weight of at least one computing node of the first rack by adopting a dynamic weight load algorithm.

The determining module 501 is further configured to determine whether the index statement is damaged according to the length of the index statement. If the index statement is not damaged, inquiring the first data according to the index statement, and determining whether the first data corresponding to the index statement can be inquired. If the first data corresponding to the index statement is queried, determining whether the first data is damaged according to the index statement. If the first data is not damaged, determining whether the size of the first data is changed according to the index statement. And if the size of the first data is not changed, checking the second data.

The data access device of the embodiment of the invention can determine the first rack from at least one rack of the first data center according to the appointed algorithm, determine the first computing node from at least one computing node of the first rack according to the appointed algorithm, store the data block to be stored to the first computing node, determine at least one second rack from at least one rack of the first data center according to the appointed algorithm, determine the second computing node from at least one rack of the at least one second rack according to the appointed algorithm respectively, and the second computing node is used for storing the copy data of the data block. The security of data access can be improved.

As shown in fig. 6, the web content marking device 600 includes an input device 601, an input interface 602, a central processor 603, a memory 604, an output interface 605, and an output device 604. The input interface 602, the central processor 603, the memory 604, and the output interface 605 are connected to each other through a bus 610, and the input device 601 and the output device 604 are connected to the bus 610 through the input interface 602 and the output interface 605, respectively, and further connected to other components of the web page content marking device 600.

Specifically, the input device 601 receives input information from the outside and transmits the input information to the central processor 603 through the input interface 602; the central processor 603 processes the input information based on computer executable instructions stored in the memory 604 to generate output information, temporarily or permanently stores the output information in the memory 604, and then transmits the output information to the output device 604 through the output interface 605; the output device 604 outputs the output information to the outside of the web content tagging device 600 for use by a user.

That is, the web content marking apparatus shown in fig. 6 may also be implemented to include: a memory storing computer-executable instructions; and a processor that when executing the computer-executable instructions can implement the method of the web content marking device described in connection with fig. 2-3.

In one embodiment, the web content marking device 600 shown in fig. 6 may be implemented as a device that may include: a memory for storing a program; and the processor is used for running the program stored in the memory to execute the webpage marking method of the embodiment of the invention.

The embodiment of the invention also provides a computer storage medium, and the computer storage medium is stored with computer program instructions which when executed by a processor realize the webpage content marking method of the embodiment of the invention.

It should be understood that the invention is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. The method processes of the present invention are not limited to the specific steps described and shown, but various changes, modifications and additions, or the order between steps may be made by those skilled in the art after appreciating the spirit of the present invention.

The functional blocks shown in the above block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.

It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. The present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.

In the foregoing, only the specific embodiments of the present invention are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present invention is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and they should be included in the scope of the present invention.

Claims

1. A method of data access, comprising:

determining a first computing node from at least one computing node of the first rack according to the specified algorithm;

storing a data block to be stored to the first computing node;

determining at least one second rack from the at least one rack of the first data center according to the specified algorithm;

Determining a second computing node from the at least one rack of the at least one second rack according to the specified algorithm, respectively, the second computing node being for storing duplicate data of the data block;

Determining whether an index statement is damaged according to the length of the index statement and in a mode of checking front and back identification bits and index size of the index statement, wherein the index statement consists of a starting position of a data block index, a disk offset address of a data block at a current storage node, the size of the data block, a data block integrity check code and an ending position of the data block index;

If the first data corresponding to the index statement is queried, determining whether the first data is damaged or not according to the index statement and a mode of whether the data can be read or whether the corresponding data block can be queried according to the corresponding position of the disk offset address of the current storage node;

if the first data is not damaged, determining whether the size of the first data is changed according to the index statement and in a mode that whether dataSize returned by the data block is consistent with the size of the data block recorded by the index;

and if the size of the first data is not changed, characterizing that the first data is not damaged.

2. The method of claim 1, wherein the determining a first computing node from at least one computing node of the first rack according to the specified algorithm comprises:

Determining a first identification from the first rack according to the specified algorithm;

and randomly determining a first computing node from at least one computing node corresponding to the first identifier.

3. The method of claim 1 or 2, wherein the specified algorithm comprises a dynamic weight loading algorithm; the determining a first rack from at least one rack of the first data center according to a specified algorithm includes:

4. The method of claim 1, wherein the determining a first computing node from at least one computing node of the first rack according to the specified algorithm comprises:

And determining a first computing node according to the weight of at least one computing node of the first rack by adopting a dynamic weight load algorithm.

5. A data access device, the device comprising:

A determining module, configured to determine a first computing node from at least one computing node of the first rack according to the specified algorithm;

A determining module further configured to determine at least one second rack from the at least one rack of the first data center according to the specified algorithm;

A determining module, configured to determine a second computing node from the at least one rack of the at least one second rack according to the specified algorithm, where the second computing node is configured to store duplicate data of the data block; determining whether an index statement is damaged according to the length of the index statement and in a mode of checking front and back identification bits and index size of the index statement, wherein the index statement consists of a starting position of a data block index, a disk offset address of a data block at a current storage node, the size of the data block, a data block integrity check code and an ending position of the data block index;

6. The apparatus of claim 5, wherein the determining module is specifically configured to:

7. The apparatus according to claim 5 or 6, wherein the determining module is specifically configured to:

8. A data access device, the device comprising: a processor and a memory storing computer program instructions; the processor reads and executes the computer program instructions to implement the method of data access as claimed in any one of claims 1 to 4.

9. A computer storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method of data access as claimed in any of claims 1 to 4.