CN113806355A - Method, database, server and medium for realizing redistribution of distributed database - Google Patents
Method, database, server and medium for realizing redistribution of distributed database Download PDFInfo
- Publication number
- CN113806355A CN113806355A CN202010547494.0A CN202010547494A CN113806355A CN 113806355 A CN113806355 A CN 113806355A CN 202010547494 A CN202010547494 A CN 202010547494A CN 113806355 A CN113806355 A CN 113806355A
- Authority
- CN
- China
- Prior art keywords
- data
- hash value
- hash
- redistribution
- data node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 95
- 238000013507 mapping Methods 0.000 claims abstract description 195
- 238000003860 storage Methods 0.000 claims abstract description 83
- 238000009826 distribution Methods 0.000 claims abstract description 49
- 230000005012 migration Effects 0.000 claims description 60
- 238000013508 migration Methods 0.000 claims description 60
- 238000004891 communication Methods 0.000 claims description 3
- 230000008859 change Effects 0.000 description 22
- 238000010586 diagram Methods 0.000 description 21
- 230000008569 process Effects 0.000 description 8
- 230000003247 decreasing effect Effects 0.000 description 7
- 230000008901 benefit Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- GOLXNESZZPUPJE-UHFFFAOYSA-N spiromesifen Chemical compound CC1=CC(C)=CC(C)=C1C(C(O1)=O)=C(OC(=O)CC(C)(C)C)C11CCCC1 GOLXNESZZPUPJE-UHFFFAOYSA-N 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/214—Database migration support
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
One or more embodiments of the present specification disclose a method for implementing redistribution of a distributed database, a server and a storage medium. The method for realizing the redistribution of the distributed database comprises the following steps: when the number of the data nodes changes, acquiring a new mapping relation between a Hash value of Hash distribution to the data nodes and the data nodes; and reallocating the corresponding data nodes for the Hash value according to the new mapping relation, so that the redistribution performance of the distributed database can be improved, and the resource consumption is reduced.
Description
Technical Field
The invention relates to the technical field of databases, in particular to a method, a database, a server and a medium for realizing redistribution of a distributed database.
Background
The distributed database combines the database and the distributed technology, and the database data nodes which are dispersed geographically are assembled into a complete logic whole through a computer system and a network. The distributed database has the advantages of good expandability, capability of realizing horizontal expansion and vertical expansion, flexible increase and decrease of data nodes and the like.
Most distributed databases in the market at present adopt a consistent hash algorithm for distribution operation, specifically, a whole hash value space is mapped into a virtual ring, the virtual ring is organized clockwise, each point on the virtual ring corresponds to a hash value, and all hash values on a section of continuous circular arc on the virtual ring correspond to a data node, so that the distribution of the data nodes is the cutting of the virtual ring. When data nodes are increased or decreased, in order to maintain the continuity of the virtual ring, the arc segment corresponding to the changed data node needs to be shifted integrally, the overall shift of the arc segment causes large data distribution change, which causes the increase of migrated data volume, which is much more than the data volume required to be stored by the changed data node, resulting in poor redistribution performance and serious resource consumption. Therefore, how to improve the redistribution performance of the distributed database and reduce the resource consumption becomes an urgent problem to be solved.
Disclosure of Invention
One or more embodiments of the present disclosure provide a method, a database, a server, and a medium for implementing redistribution of a distributed database, which can improve performance of redistribution of the distributed database and reduce resource consumption.
To solve the above technical problem, one or more embodiments of the present specification are implemented as follows:
in a first aspect, a method for implementing redistribution of a distributed database is provided, the method including the following steps: when the number of the data nodes changes, acquiring a new mapping relation of Hash values of Hash distribution to the data nodes and the data nodes; and reallocating corresponding data nodes for the Hash value according to the new mapping relation.
In a second aspect, an apparatus for implementing Hash table-based distributed database redistribution is provided, the apparatus including: the obtaining module is used for obtaining a new mapping relation between the Hash value and the data nodes when the number of the data nodes changes; and the distribution module is used for redistributing the corresponding data nodes for the Hash value according to the new mapping relation.
In a third aspect, a database is provided, said database comprising means for implementing a redistribution according to the distributed database described above.
In a fourth aspect, a server is provided, which comprises a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for implementing connection communication between the processor and the memory, wherein the program, when executed by the processor, implements the steps of the implementation method of the distributed database redistribution as described above.
In a fifth aspect, a storage medium for computer readable storage is provided, the storage medium storing one or more programs which, when executed by one or more processors, perform the steps of a method for implementing distributed database redistribution as described above.
As can be seen from the technical solutions provided in one or more embodiments of the present specification, in the implementation method of redistribution in a distributed database provided in an embodiment of the present specification, when the number of data nodes changes and redistribution needs to be performed on stored data, a mapping relationship between a Hash value and the data nodes is modified, where the data nodes are changed data nodes, and then corresponding data nodes are redistributed to the Hash value according to a new mapping relationship. After the corresponding data nodes are redistributed to the Hash value based on the new mapping relation between the Hash value and the data nodes, the stored data corresponding to the Hash value is migrated to the corresponding data nodes, and in this situation, some data nodes corresponding to the Hash value do not change, and only the stored data of which the data nodes corresponding to the Hash value change need to be moved, so that the migration quantity of the stored data is reduced, and the redistribution performance of the distributed database is improved.
Drawings
In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, reference will now be made briefly to the attached drawings, which are needed in the description of one or more embodiments or prior art, and it should be apparent that the drawings in the description below are only some of the embodiments described in the specification, and that other drawings may be obtained by those skilled in the art without inventive exercise.
Fig. 1 is a schematic step diagram of an implementation method for redistribution of a distributed database according to an embodiment of the present disclosure.
FIG. 2 is a schematic diagram illustrating steps of another implementation method for redistribution of a distributed database according to an embodiment of the present disclosure.
Fig. 3 is a schematic step diagram of an implementation method for redistribution of a distributed database provided by an embodiment of the present specification.
FIG. 4 is a schematic diagram of interaction between database modules in an implementation method for redistribution of a distributed database provided by an embodiment of the present specification.
Fig. 5 is a schematic step diagram of an implementation method for redistribution of a distributed database provided by an embodiment of the present specification.
Fig. 6 is a schematic step diagram of an implementation method for redistribution of a distributed database provided by an embodiment of the present specification.
Fig. 7 is a schematic step diagram of an implementation method for redistribution of a distributed database provided by an embodiment of the present specification.
Fig. 8 is a schematic step diagram of an implementation method for redistribution of a distributed database provided by an embodiment of the present specification.
Fig. 9 is a schematic step diagram of an implementation method for redistribution of a distributed database provided by an embodiment of the present specification.
FIG. 10 is a schematic diagram illustrating steps of a method for implementing redistribution of a distributed database according to an embodiment of the present specification.
Fig. 11 is a schematic step diagram of an implementation method for redistribution of a distributed database provided by an embodiment of the present specification.
Fig. 12 is a schematic structural diagram of an apparatus for implementing redistribution of a distributed database according to an embodiment of the present specification.
Fig. 13 is a schematic structural diagram of an apparatus for implementing redistribution of a distributed database according to an embodiment of the present disclosure.
Fig. 14 is a schematic structural diagram of a database provided in an embodiment of the present specification.
Fig. 15 is a schematic structural diagram of another database provided in an embodiment of the present specification.
Fig. 16 is a schematic structural diagram of a server provided in an embodiment of the present specification.
Detailed Description
In order to make the technical solutions in the present specification better understood, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the accompanying drawings in one or more embodiments of the present specification, and it is obvious that the one or more embodiments described are only a part of the embodiments of the present specification, and not all embodiments. All other embodiments that can be derived by a person skilled in the art from one or more of the embodiments described herein without making any inventive step shall fall within the scope of protection of this document.
The implementation method for redistribution of the distributed database provided by the specification can adjust the mapping relation between the Hash value of the Hash distribution to the data node and the data node at any time when the number of the data nodes changes and needs to be redistributed, then redistribute the Hash value to the changed data node based on the adjusted new mapping relation, and the data nodes corresponding to some Hash values can not change, and the stored data corresponding to the Hash values do not need to be migrated and only need to migrate part of the data corresponding to the Hash value, so that the redistribution performance of the distributed database can be improved, and the resource consumption is reduced. The implementation method of the redistribution of the distributed database and the steps thereof provided in the present specification will be described in detail below.
The implementation method for redistribution of the distributed database provided by the embodiment of the present specification is suitable for a distributed database using hash distribution, and data nodes of the distributed database are increased or decreased according to business requirements.
Example one
Referring to fig. 1, a schematic step diagram of an implementation method of redistribution of a distributed database provided in an embodiment of the present specification is shown. It should be understood that, the distributed database does not change the whole Hash value space of the data nodes, and the number of the Hash values and the Hash values does not change, and usually the changed Hash values and Hash values are the data nodes of the distributed database. The implementation method for redistribution of the distributed database provided by the embodiment of the specification comprises the following steps:
step 100: when the number of the data nodes changes, acquiring a new mapping relation of the Hash value of the Hash distribution to the data nodes and the data nodes;
when the distributed database needs to be expanded or reduced, the number of the data nodes changes, the mapping relation between the Hash value and the data nodes needs to be modified, and the mapping relation between the Hash value and the changed data nodes is modified. The mapping relation between the Hash value and the data node is taken as an important basis for the implementation method of redistribution provided by the embodiment of the specification, rather than taking the data size of the storage data stored in the data node as the basis for redistribution. The data nodes corresponding to the Hash values are determined so that the data nodes corresponding to the Hash values can be flexibly controlled, only the storage data corresponding to a single Hash value needs to be migrated, and the storage data corresponding to the Hash values on the whole arc corresponding to the data nodes does not need to be wholly offset. Therefore, the situation of integral offset cannot be caused, the migration quantity of the stored data is reduced, and the redistribution performance of the distributed database is improved.
When the Hash table is used for data distribution, each row of storage data can calculate a Hash value according to a distribution key (the distribution key is one or more fields in the Hash table), and then a data node corresponding to the Hash value, namely the data node to which the row of storage data belongs, is calculated through a fixed distribution algorithm. The Hash value is used for confirming the data node to which the row of storage data belongs, and the Hash value does not belong to a part of the original storage data and can be understood as the auxiliary information of the Hash table. The mapping relation between the Hash value and the data nodes can be the existing mapping relation, a new mapping relation is obtained by updating immediately after the number of the data nodes changes, or the mapping relation between the Hash value and the data nodes is obtained immediately when the number of the data nodes changes.
Step 110: reallocating corresponding data nodes for the Hash value according to the new mapping relation;
the change of the number of the data nodes can be that when the number of the data nodes is increased or the number of the data nodes is decreased, the Hash values are redistributed to all the changed data nodes, and the purpose of redistributing each Hash value to a new data node is to prepare for the next migration of the storage data.
The implementation method for redistribution of the distributed database provided in the embodiment of the present specification is undoubtedly based on the first mapping relationship: the mapping relation between the Hash value and the data node and a second mapping relation are as follows: and the mapping relation between the Hash values and the storage data transfers the storage data corresponding to each Hash value to the data nodes corresponding to the Hash values.
The corresponding data nodes are redistributed for the Hash values according to the new mapping relation, the migration quantity of the stored data can be controlled to be the variable quantity of the stored data of the changed data nodes based on the characteristics of the distributed database, the variable quantity of the stored data of the changed data nodes is controlled to be minimized as much as possible, the migrated stored data quantity is reduced, and the redistribution performance of the distributed database is improved.
The method has the advantages that the stored data corresponding to a single Hash value can be migrated based on the mapping relation between the Hash value and the data node of the distributed database, the migration amount of the stored data is reduced, one Hash value or the stored data corresponding to a plurality of Hash values can be migrated every time, the speed of single migration can be increased, the redistribution performance of the distributed database is improved, in addition, multiple batches of migration can be suspended and continued at any time, the normal work of the distributed database cannot be influenced, the redistribution can be carried out by utilizing the idle time of the distributed database, and the pressure of the distributed database is favorably reduced. In addition, as the data can be subdivided into multiple batches for migration, the consumption of the disk space can be reduced, the original consumption of the disk is basically kept, the additional disk space is not needed, the disk space is saved, and the resource consumption is reduced.
Referring to fig. 2, an overall flow of batch migration of stored data for sequentially executing a single task in a task table in an implementation method of redistribution of a distributed database provided in an embodiment of the present specification is shown.
Step S10: the metadata server MDS receives a Hash redistribution request;
step S200: MDS generates a redistribution task table;
step S210: MDS obtains single task from task list;
step S220: MDS executes single task;
step S222: if the single task is successful, updating the table structure information;
step S221: when a single task fails, recording error information;
step S230: after the single task is finished, judging whether the task table is empty or not;
step S250: if no task exists, the redistribution task is finished;
step S240: and after the single task operation is continued to be circulated, jumping to the step S220 until the step S250.
Referring to fig. 3, in some embodiments, step 100: when the number of the data nodes changes and before acquiring a new mapping relationship between the Hash value and the data nodes, the implementation method for redistribution of the distributed database provided in the embodiment of the present specification further includes:
step 120: establishing a mapping relation between a Hash value and a data node for a Hash table of the data node based on Hash distribution;
the mapping relation is required to be established when the mapping relation between the Hash value and the data node does not exist, the mapping relation between the Hash value and the data node is established for a Hash table of the data node based on Hash distribution, and the expression form of the mapping relation is not limited and can be a table or the like.
When the distributed database is redistributed, firstly, the mapping relation between the current Hash value and the data nodes is initialized based on the Hash table, the mapping relation between the Hash value and the data nodes can be initialized to be similar to a key value relation table, keys are Hash values, values are data node numbers and are stored in table metadata information of the Hash table, then, for the operation of the Hash table, a fixed algorithm is abandoned, and the data nodes are obtained by inquiring from the mapping relation. Thus, the data node corresponding to each Hash value is stored in the table metadata and can point to any data node, so that the distributed database redistribution is the redistribution of the Hash values.
Step 130: and distributing the Hash values to corresponding data nodes according to the mapping relation.
The implementation method of redistribution of the distributed database provided by the embodiment of the invention is to add a mapping relation between a Hash value obtained by a consistent Hash algorithm and a data node, wherein the mapping relation is the mapping of the corresponding relation between the consistent Hash value and the data node number.
Aiming at the step 120, a mapping relation is created under the condition that no mapping relation exists, then each Hash value is allocated to the corresponding data node according to the created mapping relation, and subsequently, if the number of the data nodes changes, only the data node information in the mapping relation needs to be changed to obtain a new mapping relation between the Hash value and the data node, and then the corresponding data node is reallocated for the Hash value according to the new mapping relation. Therefore, step 120 and step 130 in the implementation method for redistribution of the distributed database provided in the embodiment of the present specification are only executed once in the redistribution process of the distributed database, and it is only necessary to modify the data node information in the mapping relationship subsequently, and then redistribute the corresponding data node for the Hash value according to the new mapping relationship.
Referring to fig. 4, a timing diagram of redistribution interaction between various bodies of a distributed database in an implementation method of redistribution of the distributed database provided by the embodiment of the present specification is shown in fig. 4:
step 101: the management data node OMM sends a redistribution request;
step 101': the MDS returns a verification success response;
step 201: MDS sends forbidding request to PROXY, forbids reading and forbidding writing of data corresponding to Hash value, only the forbidden storage data are involved, and the influence range of the access service is reduced compared with the prior art;
step 201': PROXY returns a disable response to MDS;
step 202: MDS sends a migration data request to PROXY;
step 202': PROXY returns a migration data response to MDS;
step 203: MDS sends request for updating table structure to PROXY, namely table structure stored in metadata, and mapping relation is stored in the request;
step 203': PROXY returns an update table structure response to MDS;
step 204: MDS sends a forbidding request to PROXY to forbid the data corresponding to the Hash value;
step 204': PROXY returns a disable response to MDS;
referring to fig. 5, a flowchart of an implementation method of redistribution in a distributed database provided in an embodiment of the present specification is shown. The details are as follows:
step 101: MDS receives Hash redistribution request;
step 102: MDS inquires whether a mapping relation exists;
step 103: existing mappings (not first redistribution);
step 103: calculating a Hash value to be migrated;
step 240: returning after the task table is executed;
step 103: no mapping (first redistribution);
step 120: establishing a mapping relation between the Hash value and the data node;
step 130: calculating a Hash value of data needing to be migrated;
step 240: and returning after the task table is executed.
As shown in fig. 6, in some embodiments, step 120: before establishing a mapping relationship between a Hash value and a data node based on a Hash table, the method for implementing redistribution according to claim 2 provided in an embodiment of the present specification further includes: step 140: so that the number of Hash values corresponding to each data node is the same.
Before the first redistribution, namely before the mapping relation between the Hash value and the data nodes is established based on the Hash table, the quantity of the Hash value of each data node is kept to be the same, so that the quantity of the Hash value corresponding to each data node in the mapping relation between the Hash table and the data nodes established based on the Hash table is the same, and if the Hash value is uniformly distributed among all the data nodes during the subsequent first redistribution, the redistribution operation can be simplified, the redistribution speed is accelerated, and the redistribution performance of the distributed database is improved. For example, each original data node corresponds to 25 Hash values, 4 data nodes in total, and now a data node needs to be added, then only 5 Hash values corresponding to the original 4 data nodes need to be redistributed to a new 5 th data node, so that 20 Hash values corresponding to the original 4 data nodes remain unchanged, the migrated storage quantity is small, and an extra disk space is basically not needed.
Referring to fig. 7, in some embodiments, step 120: before establishing a mapping relationship between a Hash value and a data node based on a Hash table, the method for implementing redistribution of a distributed database provided in the embodiment of the present specification further includes:
step 150: the stored data is kept evenly distributed across all data nodes.
In the implementation method for redistribution of the distributed database used in the embodiment of the present specification, during redistribution operation, the amount of stored data is not used as a redistribution basis but the number of Hash values corresponding to each data node is used as a basis, and a suitable Hash algorithm is required to ensure the uniformity of distribution of the stored data among the data nodes, that is, the size of the stored data corresponding to each Hash value is kept the same basically.
Therefore, under the condition that the number of the Hash values corresponding to each data node is the same, a proper Hash algorithm is needed to keep the uniformity of the distribution of the stored data on all the data nodes, the mapping relation between the Hash values and the data nodes is calculated through a fixed algorithm before the new mapping relation is established through redistribution, and the Hash value distribution of each data node is continuous.
Referring to fig. 8, in some embodiments, in the implementation method of the distributed database redistribution provided by the embodiments of the present specification, step 100: when the number of the data nodes changes, acquiring a new mapping relationship between the Hash value and the data nodes, specifically comprising:
step 103: and setting the Hash values with the same quantity corresponding to each data node in the mapping relation.
And setting the Hash values with the same number corresponding to each data node when a new mapping relation is created, wherein the Hash values created based on the Hash table are the same as the Hash values corresponding to each data node in the mapping relation of the data nodes when the data nodes are redistributed for the first time, so that the consistence of the migrated Hash values and the migrated Hash values during redistribution can be ensured, and the migrated storage data volume is reduced as much as possible.
When the Hash values needing to be migrated are determined, firstly, the quantity difference value of the Hash values of each data node before redistribution and after redistribution is calculated, if the Hash values are positive values, the quantity of the Hash values needing to be migrated is the data node, if the Hash values are negative values, the quantity of the Hash values needing to be migrated is the data node, 0 represents that the Hash values corresponding to the data node do not need to be migrated, the Hash values needing to be migrated are taken out (the difference value is a positive value), the values are taken according to the minimum value principle, the Hash values which are the minimum value are taken each time and are distributed to the data node needing to be migrated, the data nodes are sequentially migrated into proper Hash values from small to large, the number of the migrated Hash values and the number of the migrated Hash values needing to be migrated are the same, and therefore the number needing to be migrated is the same as the number of the Hash values of the data nodes which need to be migrated is changed, and the data migration amount is reduced.
The migration of real data is completed after the reallocation, and the following embodiment will be described.
Referring to fig. 9, in some embodiments, in the implementation method of the distributed database redistribution provided by the embodiments of the present specification, step 103: setting the Hash values with the same corresponding quantity of each data node in the mapping relation, specifically comprising:
step 104: when the number of the data nodes changes, determining an migrated data node needing to migrate a Hash value and an migrated data node needing to migrate a Hash value, wherein the number of the migrated Hash value is the same as that of the migrated Hash value;
if the Hash values corresponding to each original data node are different, for example, the number of the Hash values corresponding to the first data node is 20, the number of the Hash values corresponding to the second data node is 25, the number of the Hash values corresponding to the third data node is 30, the number of the Hash values corresponding to the fourth data node is 25, a data node is added, the Hash values corresponding to the data nodes are set to be the same, the number of the Hash values corresponding to the data nodes is 20, the first data node does not need to migrate the Hash values, the second data node needs to migrate 5 Hash values, the third data node needs to migrate 10 Hash values, the fourth data node needs to migrate 5 Hash values, the 5 data nodes need to migrate 20 Hash values, and the 5 Hash values needed by the second data node are migrated first during the first migration, and the 5 Hash values are migrated into the fifth data node (the row number of the second data node is smaller than the row number of the third data node) Number), after 10 Hash values that the third data node needs to migrate out migrate to the fifth data node, 5 Hash values that the fourth data node needs to migrate out migrate to the fifth data node. The number of the migrated Hash values is the same as that of the migrated Hash values, so that the data migration amount is reduced.
For example, each original data node corresponds to 25 Hash values, 4 data nodes in total, and now a data node needs to be added, then the original 4 data nodes need to migrate the 5 corresponding Hash values to the new 5 th data node, so that the 20 Hash values corresponding to the original 4 data nodes remain unchanged, so that the number of the Hash values to be migrated is the same as the number of the Hash values of the changed data node (5 th data node), the migrated storage quantity is small, and an additional disk space is basically not needed. Step 105: and reallocating the required migration hash value of the migration data node to the migration data node.
And reallocating the Hash value to be migrated to the migrated data node to complete the reallocation of the Hash value and the data node redistributed to the distributed database, and then migrating the storage data corresponding to the Hash value to be migrated to the corresponding data node.
After the number of the migrated Hash values and the number of the migrated Hash values in the new mapping relation can be kept the same under the condition that the Hash values corresponding to the data nodes before the first redistribution are the same, the number of the Hash values to be migrated is the same as the number of the Hash values corresponding to the changed data nodes, and the migration amount of the stored data is reduced.
Referring to FIG. 10, in some embodiments, step 110: after the corresponding data nodes are redistributed to the Hash value according to the new mapping relationship, the implementation method for redistribution of the distributed database provided in the embodiment of the present specification further includes:
step 200: generating a corresponding task table, wherein the task table at least comprises a Hash value, a migrated data node and a migrated data node;
the redistribution task table has three pieces of information: and the Hash value, the migrated data node and the migrated data node are used for migrating the real storage data after redistribution is completed.
Step 220: migrating the storage data corresponding to the Hash value from the migrated data node to the migrated data node according to the task table;
and after the stored data corresponding to the Hash value is migrated to the migrated data node, deleting the original stored data corresponding to the Hash value on the migrated data node to complete the migration of the stored data corresponding to the Hash value, and completing the redistribution of the distributed database after the migration of all the stored data corresponding to the Hash value in the task table is sequentially completed.
The stored data corresponding to each Hash value can be determined by the Hash value in the last column of the Hash table.
Step 222: and deleting the stored data corresponding to the Hash value on the migrated data node.
The data migration can be divided into multiple times, 1 or more storage data corresponding to the Hash values can be migrated each time, the migration of each time is independent, the single failure does not affect the overall redistribution, and the redistribution task can be initiated for multiple times until the storage data on each data node are uniformly distributed.
Referring to FIG. 11, in some embodiments, step 220: before migrating the storage data corresponding to the Hash value from the migrated data node to the migrated data node according to the task table, the implementation method for redistribution of the distributed database provided in the embodiment of the present specification further includes:
step 111: implicitly inserting the hash value into a hash table;
in order to determine corresponding data according to the Hash value during redistribution, a column is implicitly added at the tail of the Hash table for storing the Hash value, and the Hash value is implicitly inserted when data is inserted.
Step 241: and acquiring the storage data corresponding to the hash value based on the hash value in the hash table.
And when the storage data is migrated, acquiring the storage data corresponding to the hash value based on the hash value implicitly inserted in the hash table and migrating the storage data. Step 111: two steps are needed to be executed when the Hash value is implicitly inserted into the Hash table, and firstly, the table building process in the Hash table is as follows:
1. PROXY received Table building statement
2. Judging the type of the form-building statement
2.1. Is a Hash table
2.1.1. At the end of the table building statement, a column is added implicitly, and the column is used for storing a Hash value
2.1.2. Issuing SQL to a newly added column to inform the single data node database of the position of the Hash value;
2.2. not a Hash table
2.2.1. The original flow is kept unchanged
Secondly, inserting a Hash value into the Hash table:
1. PROXY received Table building statement
2. Judging the type of the form-building statement
2.1. Is a Hash table
2.1.1. Calculating a Hash value from a distribution key
2.1.2. And modifying a column of insertion Hash values added at last in the insert statement, wherein the Hash values are used for inquiring the stored data corresponding to the Hash values and determining the data during migration.
2.1.3. Issuing SQL, wherein the SQL is the SQL which adds a Hash value and informs a single data node database;
2.2. not a Hash table
2.2.1. The original flow is kept unchanged.
As can be seen from the above analysis, in the implementation method for redistribution of the distributed database provided in the embodiment of the present specification, when the number of data nodes changes and redistribution of stored data is required, the mapping relationship between the Hash value and the data nodes is modified, where the data nodes are changed data nodes, and then corresponding data nodes are redistributed for the Hash value according to a new mapping relationship. After the corresponding data nodes are redistributed to the Hash value based on the mapping relation between the Hash value and the data nodes, the stored data corresponding to the Hash value are migrated to the corresponding data nodes, in this situation, some data nodes corresponding to the Hash value do not change, only the stored data of which the data nodes corresponding to the Hash value change need to be moved, the migration amount of the stored data is reduced, the redistribution performance of the distributed database is improved, and the stored data are migrated from the migrated data nodes to the migrated data nodes only, so that the consumption of a disk basically keeps the original usage amount, the disk space is saved, and the resource consumption is reduced.
Example two
Fig. 12 is a schematic structural diagram of an apparatus 100 for implementing redistribution of a distributed database according to an embodiment of the present disclosure. The device for realizing the redistribution of the distributed database comprises:
the obtaining module 101 is configured to obtain a new mapping relationship between a Hash value of the Hash distribution to the data node and the data node when the number of the data nodes changes;
when the distributed database needs capacity expansion or capacity reduction, the number of the data nodes changes, the mapping relation between the Hash value and the data nodes needs to be modified, and the mapping relation between the Hash value and the changed data nodes needs to be modified. The mapping relation between the Hash value and the data node is taken as an important basis for the implementation method of redistribution provided by the embodiment of the specification, rather than taking the data size of the storage data stored in the data node as the basis for redistribution. The data nodes corresponding to the Hash values are determined so that the data nodes corresponding to the Hash values can be flexibly controlled, only the storage data corresponding to a single Hash value needs to be migrated, and the storage data corresponding to the Hash values on the whole arc corresponding to the data nodes does not need to be wholly offset. Therefore, the situation of integral offset cannot be caused, the migration quantity of the stored data is reduced, and the redistribution performance of the distributed database is improved.
When the Hash table is used for data distribution, each row of storage data can calculate a Hash value according to a distribution key (the distribution key is one or more fields in the Hash table), and then a data node corresponding to the Hash value, namely the data node to which the row of storage data belongs, is calculated through a fixed distribution algorithm. The Hash value is used for confirming the data node to which the row of storage data belongs, and the Hash value does not belong to a part of the original storage data and can be understood as the auxiliary information of the Hash table. The mapping relation between the Hash value and the data nodes can be the existing mapping relation, a new mapping relation is obtained by updating immediately after the number of the data nodes changes, or the mapping relation between the Hash value and the data nodes is obtained immediately when the number of the data nodes changes.
And the allocating module 102 is configured to reallocate the corresponding data node for the Hash value according to the new mapping relationship.
The change of the number of the data nodes can be that when the number of the data nodes is increased or the number of the data nodes is decreased, the Hash values are redistributed to all the changed data nodes, and the purpose of redistributing each Hash value to a new data node is to prepare for the next migration of the storage data.
The implementation method for redistribution of the distributed database provided in the embodiment of the present specification is undoubtedly based on the first mapping relationship: the mapping relation between the Hash value and the data node and a second mapping relation are as follows: and the mapping relation between the Hash values and the storage data transfers the storage data corresponding to each Hash value to the data nodes corresponding to the Hash values.
The corresponding data nodes are redistributed for the Hash values according to the new mapping relation, the migration quantity of the stored data can be controlled to be the variable quantity of the stored data of the changed data nodes based on the characteristics of the distributed database, the variable quantity of the stored data of the changed data nodes is controlled to be minimized as much as possible, the migrated stored data quantity is reduced, and the redistribution performance of the distributed database is improved.
The method has the advantages that the stored data corresponding to a single Hash value can be migrated based on the mapping relation between the Hash value and the data node of the distributed database, the migration amount of the stored data is reduced, one Hash value or the stored data corresponding to a plurality of Hash values can be migrated every time, the speed of single migration can be increased, the redistribution performance of the distributed database is improved, in addition, multiple batches of migration can be suspended and continued at any time, the normal work of the distributed database cannot be influenced, the redistribution can be carried out by utilizing the idle time of the distributed database, and the pressure of the distributed database is favorably reduced. In addition, as the data can be subdivided into multiple batches for migration, the consumption of the disk space can be reduced, the original consumption of the disk is basically kept, the additional disk space is not needed, the disk space is saved, and the resource consumption is reduced.
Referring to fig. 13, in some embodiments, an apparatus for implementing redistribution in a distributed database provided by an embodiment of the present specification further includes:
the initialization module 103 is configured to establish a mapping relationship between a Hash value and a data node for a Hash table of the data node based on Hash operation; and the number of the first and second groups,
the mapping relation is needed to be established when the mapping relation between the Hash value and the data node does not exist, the mapping relation between the Hash value and the data node is established based on a Hash table, the expression form of the mapping relation is not limited and can be a table or the like.
When the distributed database is redistributed, firstly, the mapping relation between the current Hash value and the data nodes is initialized based on the Hash table, the mapping relation between the Hash value and the data nodes can be initialized to be similar to a key value relation table, keys are Hash values, values are data node numbers and are stored in table metadata information of the Hash table, then, for the operation of the Hash table, a fixed algorithm is abandoned, and the data nodes are obtained by inquiring from the mapping relation. Thus, the data node corresponding to each Hash value is stored in the table metadata and can point to any data node, so that the distributed database redistribution is the redistribution of the Hash values.
The distribution module 102 is configured to distribute the Hash value to the corresponding data node according to the mapping relationship;
the implementation method of redistribution of the distributed database provided by the embodiment of the invention is to add a mapping relation between a Hash value obtained by adopting a consistent Hash algorithm of Hash distribution and a data node, wherein the mapping relation is the mapping of the corresponding relation between the consistent Hash value and the data node number.
Aiming at the step 120, a mapping relation is created under the condition that no mapping relation exists, then each Hash value is allocated to the corresponding data node according to the created mapping relation, and subsequently, if the number of the data nodes changes, only the data node information in the mapping relation needs to be changed to obtain a new mapping relation between the Hash value and the data node, and then the corresponding data node is reallocated for the Hash value according to the new mapping relation. Therefore, step 120 and step 130 in the implementation method for redistribution of the distributed database provided in the embodiment of the present specification are only executed once in the redistribution process of the distributed database, and it is only necessary to modify the data node information in the mapping relationship subsequently, and then redistribute the corresponding data node for the Hash value according to the new mapping relationship.
As can be seen from the above analysis, in the implementation method for redistribution of the distributed database provided in the embodiment of the present specification, when the number of data nodes changes and the stored data needs to be redistributed, the mapping relationship between the Hash value of the Hash distribution to the data nodes and the data nodes is modified, where the data nodes are changed data nodes, and then the corresponding data nodes are redistributed to the Hash value according to the new mapping relationship. After the corresponding data nodes are redistributed to the Hash value based on the new mapping relation between the Hash value and the data nodes, the stored data corresponding to the Hash value is migrated to the corresponding data nodes, and in this situation, some data nodes corresponding to the Hash value do not change, and only the stored data of which the data nodes corresponding to the Hash value change need to be moved, so that the migration quantity of the stored data is reduced, and the redistribution performance of the distributed database is improved.
EXAMPLE III
Referring to fig. 14, a schematic structural diagram of the database 10 provided in the embodiment of the present disclosure is shown. The implementation apparatus 100 for redistribution of distributed database includes an obtaining module 101 and an allocating module 102, where:
the obtaining module 101 is configured to obtain a new mapping relationship between a Hash value of the Hash distribution to the data node and the data node when the number of the data nodes changes;
when the distributed database needs capacity expansion or capacity reduction, the number of the data nodes changes, the mapping relation between the Hash value and the data nodes needs to be modified, and the new mapping relation between the Hash value and the changed data nodes is modified. The mapping relation between the Hash value and the data node is taken as an important basis for the implementation method of redistribution provided by the embodiment of the specification, rather than taking the data size of the storage data stored in the data node as the basis for redistribution. The data nodes corresponding to the Hash values are determined so that the data nodes corresponding to the Hash values can be flexibly controlled, only the storage data corresponding to a single Hash value needs to be migrated, and the storage data corresponding to the Hash values on the whole arc corresponding to the data nodes does not need to be wholly offset. Therefore, the situation of integral offset cannot be caused, the migration quantity of the stored data is reduced, and the redistribution performance of the distributed database is improved.
When the Hash table is used for data distribution, each row of storage data can calculate a Hash value according to a distribution key (the distribution key is one or more fields in the Hash table), and then a data node corresponding to the Hash value, namely the data node to which the row of storage data belongs, is calculated through a fixed distribution algorithm. The Hash value is used for confirming the data node to which the row of storage data belongs, and the Hash value does not belong to a part of the original storage data and can be understood as the auxiliary information of the Hash table. The mapping relation between the Hash value and the data nodes can be the existing mapping relation, a new mapping relation is obtained by updating immediately after the number of the data nodes changes, or the mapping relation between the Hash value and the data nodes is obtained immediately when the number of the data nodes changes.
And the allocating module 102 is configured to reallocate the corresponding data node for the Hash value according to the new mapping relationship.
The change of the number of the data nodes can be that when the number of the data nodes is increased or the number of the data nodes is decreased, the Hash values are redistributed to all the changed data nodes, and the purpose of redistributing each Hash value to a new data node is to prepare for the next migration of the storage data.
The implementation apparatus for redistribution of distributed database provided in the embodiments of the present specification is undoubtedly based on the first mapping relationship: the mapping relation between the Hash value and the data node and a second mapping relation are as follows: and the mapping relation between the Hash values and the storage data transfers the storage data corresponding to each Hash value to the data nodes corresponding to the Hash values.
The corresponding data nodes are redistributed for the Hash values according to the new mapping relation, the migration quantity of the stored data can be controlled to be the variable quantity of the stored data of the changed data nodes based on the characteristics of the distributed database, the variable quantity of the stored data of the changed data nodes is controlled to be minimized as much as possible, the migrated stored data quantity is reduced, and the redistribution performance of the distributed database is improved.
The method has the advantages that the stored data corresponding to a single Hash value can be migrated based on the mapping relation between the Hash value and the data node of the distributed database, the migration amount of the stored data is reduced, one Hash value or the stored data corresponding to a plurality of Hash values can be migrated every time, the speed of single migration can be increased, the redistribution performance of the distributed database is improved, in addition, multiple batches of migration can be suspended and continued at any time, the normal work of the distributed database cannot be influenced, the redistribution can be carried out by utilizing the idle time of the distributed database, and the pressure of the distributed database is favorably reduced. In addition, as the data can be subdivided into multiple batches for migration, the consumption of the disk space can be reduced, the original consumption of the disk is basically kept, the additional disk space is not needed, the disk space is saved, and the resource consumption is reduced.
Referring to fig. 15, a schematic structural diagram of the database 10 provided in the embodiment of the present disclosure is shown. The implementation apparatus 100 for redistribution of distributed database further includes an initialization module 103, wherein:
the initialization module 103 is used for establishing a mapping relation between a Hash value and a data node based on a Hash table; and the number of the first and second groups,
the mapping relation is needed to be established when the mapping relation between the Hash value and the data node does not exist, the mapping relation between the Hash value and the data node is established based on a Hash table, the expression form of the mapping relation is not limited and can be a table or the like.
When the distributed database is redistributed, firstly, the mapping relation between the current Hash value and the data nodes is initialized based on the Hash table, the mapping relation between the Hash value and the data nodes can be initialized to be similar to a key value relation table, keys are Hash values, values are data node numbers and are stored in table metadata information of the Hash table, then, for the operation of the Hash table, a fixed algorithm is abandoned, and the data nodes are obtained by inquiring from the mapping relation. Therefore, the data nodes corresponding to each Hash value are stored in the table metadata and can point to any data node, and the redistribution of the distributed database is realized by redistributing the Hash values.
The distribution module 102 is configured to distribute the Hash value to the corresponding data node according to the mapping relationship;
the implementation method of redistribution of the distributed database provided by the embodiment of the invention is to add a mapping relation between a Hash value obtained by adopting a consistent Hash algorithm of Hash distribution and a data node, wherein the mapping relation is the mapping of the corresponding relation between the consistent Hash value and the data node number.
Aiming at the step 120, a mapping relation is created under the condition that no mapping relation exists, then each Hash value is allocated to the corresponding data node according to the created mapping relation, and subsequently, if the number of the data nodes changes, only the data node information in the mapping relation needs to be changed to obtain a new mapping relation between the Hash value and the data node, and then the corresponding data node is reallocated for the Hash value according to the new mapping relation. Therefore, step 120 and step 130 in the implementation method for redistribution of the distributed database provided in the embodiment of the present specification are only executed once in the redistribution process of the distributed database, and it is only necessary to modify the data node information in the mapping relationship subsequently, and then redistribute the corresponding data node for the Hash value according to the new mapping relationship.
As can be seen from the above analysis, in the implementation method for redistribution of the distributed database provided in the embodiment of the present specification, when the number of data nodes changes and redistribution of stored data is required, the mapping relationship between the Hash value and the data nodes is modified, where the data nodes are changed data nodes, and then corresponding data nodes are redistributed for the Hash value according to a new mapping relationship. After the corresponding data nodes are redistributed to the Hash value based on the mapping relation between the Hash value and the data nodes, the stored data corresponding to the Hash value are migrated to the corresponding data nodes, in this situation, some data nodes corresponding to the Hash value do not change, only the stored data of which the data nodes corresponding to the Hash value change need to be moved, the migration amount of the stored data is reduced, the redistribution performance of the distributed database is improved, and the stored data are migrated from the migrated data nodes to the migrated data nodes only, so that the consumption of a disk basically keeps the original usage amount, the disk space is saved, and the resource consumption is reduced.
Example four
Referring to fig. 16, in the server 1600 provided in the embodiments of the present specification, the server 1600 includes a memory 1620, a processor 1610, a program stored in the memory 1620 and operable on the processor 1610, and a data bus 1640 for implementing connection communication between the processor 1610 and the memory 1620, where the program, when executed by the processor 1610, implements the steps of the implementation method of redistribution shown in fig. 1 to 11, and specifically implements the following steps:
step 100: when the number of the data nodes changes, acquiring a new mapping relation of the Hash value of the Hash distribution to the data nodes and the data nodes;
when the distributed database needs to be expanded or reduced, the number of the data nodes changes, the mapping relation between the Hash value and the data nodes needs to be modified, and the mapping relation between the Hash value and the changed data nodes is modified. The mapping relation between the Hash value and the data node is taken as an important basis for the implementation method of redistribution provided by the embodiment of the specification, rather than taking the data size of the storage data stored in the data node as the basis for redistribution. The data nodes corresponding to the Hash values are determined so that the data nodes corresponding to the Hash values can be flexibly controlled, only the storage data corresponding to a single Hash value needs to be migrated, and the storage data corresponding to the Hash values on the whole arc corresponding to the data nodes does not need to be wholly offset. Therefore, the situation of integral offset cannot be caused, the migration quantity of the stored data is reduced, and the redistribution performance of the distributed database is improved.
When the Hash table is used for data distribution, each row of storage data can calculate a Hash value according to a distribution key (the distribution key is one or more fields in the Hash table), and then a data node corresponding to the Hash value, namely the data node to which the row of storage data belongs, is calculated through a fixed distribution algorithm. The Hash value is used for confirming the data node to which the row of storage data belongs, and the Hash value does not belong to a part of the original storage data and can be understood as the auxiliary information of the Hash table. The mapping relation between the Hash value and the data nodes can be the existing mapping relation, a new mapping relation is obtained by updating immediately after the number of the data nodes changes, or the mapping relation between the Hash value and the data nodes is obtained immediately when the number of the data nodes changes.
Step 110: reallocating corresponding data nodes for the Hash value according to the new mapping relation;
the change of the number of the data nodes can be that when the number of the data nodes is increased or the number of the data nodes is decreased, the Hash values are redistributed to all the changed data nodes, and the purpose of redistributing each Hash value to a new data node is to prepare for the next migration of the storage data.
The implementation method for redistribution of the distributed database provided in the embodiment of the present specification is undoubtedly based on the first mapping relationship: the mapping relation between the Hash value and the data node and a second mapping relation are as follows: and the mapping relation between the Hash values and the storage data transfers the storage data corresponding to each Hash value to the data nodes corresponding to the Hash values.
The corresponding data nodes are redistributed for the Hash values according to the new mapping relation, the migration quantity of the stored data can be controlled to be the variable quantity of the stored data of the changed data nodes based on the characteristics of the distributed database, the variable quantity of the stored data of the changed data nodes is controlled to be minimized as much as possible, the migrated stored data quantity is reduced, and the redistribution performance of the distributed database is improved.
The method has the advantages that the stored data corresponding to a single Hash value can be migrated based on the mapping relation between the Hash value and the data node of the distributed database, the migration amount of the stored data is reduced, one Hash value or the stored data corresponding to a plurality of Hash values can be migrated every time, the speed of single migration can be increased, the redistribution performance of the distributed database is improved, in addition, multiple batches of migration can be suspended and continued at any time, the normal work of the distributed database cannot be influenced, the redistribution can be carried out by utilizing the idle time of the distributed database, and the pressure of the distributed database is favorably reduced. In addition, as the data can be subdivided into multiple batches for migration, the consumption of the disk space can be reduced, the original consumption of the disk is basically kept, the additional disk space is not needed, the disk space is saved, and the resource consumption is reduced.
As can be seen from the above analysis, in the implementation method for redistribution of the distributed database provided in the embodiment of the present specification, when the number of data nodes changes and redistribution of stored data is required, a new mapping relationship between a Hash value and the data nodes is modified, where the data nodes are changed data nodes, and then corresponding data nodes are redistributed for the Hash value according to the new mapping relationship. After the corresponding data nodes are redistributed to the Hash value based on the mapping relation between the Hash value and the data nodes, the stored data corresponding to the Hash value are migrated to the corresponding data nodes, in this situation, some data nodes corresponding to the Hash value do not change, only the stored data of which the data nodes corresponding to the Hash value change need to be moved, the migration amount of the stored data is reduced, the redistribution performance of the distributed database is improved, and the stored data are migrated from the migrated data nodes to the migrated data nodes only, so that the consumption of a disk basically keeps the original usage amount, the disk space is saved, and the resource consumption is reduced.
EXAMPLE five
A storage medium provided in this specification for a computer readable storage, the storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps of the redistribution implementation method shown in fig. 1 to 11, and specifically perform the following steps:
step 100: when the number of the data nodes changes, acquiring a new mapping relation of the Hash value of the Hash distribution to the data nodes and the data nodes;
when the distributed database needs to be expanded or reduced, the number of the data nodes changes, the mapping relation between the Hash value and the data nodes needs to be modified, and the mapping relation between the Hash value and the changed data nodes is modified. The mapping relation between the Hash value and the data node is taken as an important basis for the implementation method of redistribution provided by the embodiment of the specification, rather than taking the data size of the storage data stored in the data node as the basis for redistribution. The data nodes corresponding to the Hash values are determined so that the data nodes corresponding to the Hash values can be flexibly controlled, only the storage data corresponding to a single Hash value needs to be migrated, and the storage data corresponding to the Hash values on the whole arc corresponding to the data nodes does not need to be wholly offset. Therefore, the situation of integral offset cannot be caused, the migration quantity of the stored data is reduced, and the redistribution performance of the distributed database is improved.
When the Hash table is used for data distribution, each row of storage data can calculate a Hash value according to a distribution key (the distribution key is one or more fields in the Hash table), and then a data node corresponding to the Hash value, namely the data node to which the row of storage data belongs, is calculated through a fixed distribution algorithm. The Hash value is used for confirming the data node to which the row of storage data belongs, and the Hash value does not belong to a part of the original storage data and can be understood as the auxiliary information of the Hash table. The mapping relation between the Hash value and the data nodes can be the existing mapping relation, a new mapping relation is obtained by updating immediately after the number of the data nodes changes, or the mapping relation between the Hash value and the data nodes is obtained immediately when the number of the data nodes changes.
Step 110: reallocating corresponding data nodes for the Hash value according to the new mapping relation;
the change of the number of the data nodes can be that when the number of the data nodes is increased or the number of the data nodes is decreased, the Hash values are redistributed to all the changed data nodes, and the purpose of redistributing each Hash value to a new data node is to prepare for the next migration of the storage data.
The implementation method for redistribution of the distributed database provided in the embodiment of the present specification is undoubtedly based on the first mapping relationship: the mapping relation between the Hash value and the data node and a second mapping relation are as follows: and the mapping relation between the Hash values and the storage data transfers the storage data corresponding to each Hash value to the data nodes corresponding to the Hash values.
The corresponding data nodes are redistributed for the Hash values according to the new mapping relation, the migration quantity of the stored data can be controlled to be the variable quantity of the stored data of the changed data nodes based on the characteristics of the distributed database, the variable quantity of the stored data of the changed data nodes is controlled to be minimized as much as possible, the migrated stored data quantity is reduced, and the redistribution performance of the distributed database is improved.
The method has the advantages that the stored data corresponding to a single Hash value can be migrated based on the mapping relation between the Hash value and the data node of the distributed database, the migration amount of the stored data is reduced, one Hash value or the stored data corresponding to a plurality of Hash values can be migrated every time, the speed of single migration can be increased, the redistribution performance of the distributed database is improved, in addition, multiple batches of migration can be suspended and continued at any time, the normal work of the distributed database cannot be influenced, the redistribution can be carried out by utilizing the idle time of the distributed database, and the pressure of the distributed database is favorably reduced. In addition, as the data can be subdivided into multiple batches for migration, the consumption of the disk space can be reduced, the original consumption of the disk is basically kept, the additional disk space is not needed, the disk space is saved, and the resource consumption is reduced.
As can be seen from the above analysis, in the implementation method for redistribution of the distributed database provided in the embodiment of the present specification, when the number of data nodes changes and redistribution of stored data is required, the mapping relationship between the Hash value and the data nodes is modified, where the data nodes are changed data nodes, and then corresponding data nodes are redistributed for the Hash value according to a new mapping relationship. After the corresponding data nodes are redistributed to the Hash value based on the mapping relation between the Hash value and the data nodes, the stored data corresponding to the Hash value are migrated to the corresponding data nodes, in this situation, some data nodes corresponding to the Hash value do not change, only the stored data of which the data nodes corresponding to the Hash value change need to be moved, the migration amount of the stored data is reduced, the redistribution performance of the distributed database is improved, and the stored data are migrated from the migrated data nodes to the migrated data nodes only, so that the consumption of a disk basically keeps the original usage amount, the disk space is saved, and the resource consumption is reduced.
In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present specification shall be included in the protection scope of the present specification.
The system, apparatus, module or unit illustrated in one or more of the above embodiments may be implemented by a computer chip or an entity, or by an article of manufacture with a certain functionality. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer-readable storage media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Claims (13)
1. A method for implementing redistribution of a distributed database, the method comprising the steps of:
when the number of the data nodes changes, acquiring a new mapping relation of Hash values of Hash distribution to the data nodes and the data nodes;
and reallocating corresponding data nodes for the Hash value according to the new mapping relation.
2. The method as claimed in claim 1, wherein before obtaining the new mapping relationship between the Hash value and the data node when the number of data nodes changes, the method further comprises:
establishing a mapping relation between the Hash value and the data node based on the Hash table of the Hash distribution;
and distributing the Hash values to corresponding data nodes according to the mapping relation.
3. The method according to claim 1 or 2, wherein before the establishing the mapping relationship between the Hash value and the data node based on the Hash table, the method further comprises:
so that the number of Hash values corresponding to each data node is the same.
4. The method as claimed in claim 3, wherein before establishing the mapping relationship between the Hash value and the data node based on the Hash table, the method further comprises:
the stored data is kept evenly distributed across all data nodes.
5. The implementation method of claim 3, wherein when the number of data nodes changes, obtaining a new mapping relationship between the Hash value and the data nodes comprises:
and setting the Hash values with the same number corresponding to each data node in the new mapping relation.
6. The implementation method of claim 5, wherein the setting of the same number of Hash values corresponding to each data node in the new mapping relationship specifically comprises:
when the number of the data nodes changes, determining an migrated data node needing to migrate a Hash value and an migrated data node needing to migrate a Hash value, wherein the number of the migrated Hash value is the same as that of the migrated Hash value;
and reallocating the required migration hash value of the migration data node to the migration data node.
7. The method as claimed in claim 1, wherein after the modified mapping relationship is used to reassign the corresponding data node to the Hash value, the method further comprises:
generating a corresponding task table, wherein the task table at least comprises a Hash value, a migrated data node and a migrated data node;
migrating the storage data corresponding to the Hash value from the migrated data node to the migrated data node according to the task table;
and deleting the stored data corresponding to the Hash value on the migrated data node.
8. The implementation method as claimed in claim 7, before migrating the storage data corresponding to the Hash value from the migrated data node to the migrated data node according to the task table, the method further comprising:
implicitly inserting the hash value into the hash table;
and acquiring the storage data corresponding to the hash value based on the hash value in the hash table.
9. An apparatus for implementing redistribution of a distributed database, the apparatus comprising:
the obtaining module is used for obtaining a Hash value of the Hash distribution to the data nodes and a new mapping relation of the data nodes when the number of the data nodes changes;
and the distribution module is used for redistributing the corresponding data nodes to the Hash value according to the new mapping relation.
10. The apparatus of claim 9, the apparatus further comprising:
the initialization module is used for establishing a mapping relation between a Hash value and a data node based on a Hash table; and the number of the first and second groups,
and the distribution module is used for distributing the Hash value to the corresponding data node according to the mapping relation.
11. A database comprising an implementation device according to claim 9 or 10.
12. A server comprising a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for implementing a connection communication between the processor and the memory, the program, when executed by the processor, implementing the steps of the method for implementing a distributed database redistribution as claimed in any one of claims 1 to 8.
13. A storage medium for computer readable storage, the storage medium storing one or more programs which, when executed by one or more processors, perform the steps of a method for implementing a distributed database redistribution as claimed in any of claims 1 to 8.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010547494.0A CN113806355A (en) | 2020-06-16 | 2020-06-16 | Method, database, server and medium for realizing redistribution of distributed database |
PCT/CN2021/093630 WO2021254047A1 (en) | 2020-06-16 | 2021-05-13 | Method for realizing redistribution of distributed database, database, server and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010547494.0A CN113806355A (en) | 2020-06-16 | 2020-06-16 | Method, database, server and medium for realizing redistribution of distributed database |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113806355A true CN113806355A (en) | 2021-12-17 |
Family
ID=78944265
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010547494.0A Pending CN113806355A (en) | 2020-06-16 | 2020-06-16 | Method, database, server and medium for realizing redistribution of distributed database |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113806355A (en) |
WO (1) | WO2021254047A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100146122A1 (en) * | 2007-12-26 | 2010-06-10 | Symantec Corporation | Balanced Consistent Hashing for Distributed Resource Management |
CN104572809A (en) * | 2014-11-17 | 2015-04-29 | 杭州斯凯网络科技有限公司 | Distributive relational database free expansion method |
US20160210340A1 (en) * | 2015-01-21 | 2016-07-21 | Futurewei Technologies, Inc. | System and Method for Massively Parallel Processor Database |
CN108932256A (en) * | 2017-05-25 | 2018-12-04 | 中兴通讯股份有限公司 | Distributed data redistribution control method, device and data management server |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9417903B2 (en) * | 2013-06-21 | 2016-08-16 | International Business Machines Corporation | Storage management for a cluster of integrated computing systems comprising integrated resource infrastructure using storage resource agents and synchronized inter-system storage priority map |
CN106034144B (en) * | 2015-03-12 | 2019-10-15 | 中国人民解放军国防科学技术大学 | A kind of fictitious assets date storage method based on load balancing |
CN106407308A (en) * | 2016-08-31 | 2017-02-15 | 天津南大通用数据技术股份有限公司 | Method and device for expanding capacity of distributed database |
CN106250566A (en) * | 2016-08-31 | 2016-12-21 | 天津南大通用数据技术股份有限公司 | A kind of distributed data base and the management method of data operation thereof |
-
2020
- 2020-06-16 CN CN202010547494.0A patent/CN113806355A/en active Pending
-
2021
- 2021-05-13 WO PCT/CN2021/093630 patent/WO2021254047A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100146122A1 (en) * | 2007-12-26 | 2010-06-10 | Symantec Corporation | Balanced Consistent Hashing for Distributed Resource Management |
CN104572809A (en) * | 2014-11-17 | 2015-04-29 | 杭州斯凯网络科技有限公司 | Distributive relational database free expansion method |
US20160210340A1 (en) * | 2015-01-21 | 2016-07-21 | Futurewei Technologies, Inc. | System and Method for Massively Parallel Processor Database |
CN108932256A (en) * | 2017-05-25 | 2018-12-04 | 中兴通讯股份有限公司 | Distributed data redistribution control method, device and data management server |
Also Published As
Publication number | Publication date |
---|---|
WO2021254047A1 (en) | 2021-12-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107622091B (en) | Database query method and device | |
US9576019B2 (en) | Increasing distributed database capacity | |
US10356150B1 (en) | Automated repartitioning of streaming data | |
CN108959510B (en) | Partition level connection method and device for distributed database | |
CN111813805A (en) | Data processing method and device | |
CN109343793B (en) | Data migration method and device | |
CN105354315A (en) | Region division method in distributed database, Region node and system | |
CN111159140A (en) | Data processing method and device, electronic equipment and storage medium | |
CN116167092B (en) | Secret state data query method and device, storage medium and electronic equipment | |
CN109788013B (en) | Method, device and equipment for distributing operation resources in distributed system | |
CN115599764A (en) | Method, device and medium for migrating table data | |
CN114442952A (en) | Cold data migration method and device, storage medium and electronic device | |
CN114253456A (en) | Cache load balancing method and device | |
CN107102898B (en) | Memory management and data structure construction method and device based on NUMA (non Uniform memory Access) architecture | |
CN113806355A (en) | Method, database, server and medium for realizing redistribution of distributed database | |
US10901972B2 (en) | Table partition configuration method, apparatus and system for database system | |
EP4425892A1 (en) | Resource operating method and apparatus, electronic device, and storage medium | |
CN108536759B (en) | Sample playback data access method and device | |
CN114676132A (en) | Data table association method and device, storage medium and electronic equipment | |
CN116737370A (en) | Multi-resource scheduling method, system, storage medium and terminal | |
CN109582938B (en) | Report generation method and device | |
CN110442605B (en) | Cache management method and device of server | |
CN115328608A (en) | Kubernetes container vertical expansion adjusting method and device | |
CN115827745A (en) | Memory database cluster and implementation method and device thereof | |
CN113126884A (en) | Data migration method and device, electronic equipment and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220113 Address after: 100176 602, floor 6, building 6, courtyard 10, KEGU 1st Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing (Yizhuang group, high-end industrial area of Beijing Pilot Free Trade Zone) Applicant after: Jinzhuan Xinke Co.,Ltd. Address before: 518000 Zhongnan communication tower, South China Road, Nanshan District high tech Industrial Park, Shenzhen, Guangdong Applicant before: ZTE Corp. |
|
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |