CN118277456B

CN118277456B - Initiating node output method in MPP distributed system

Info

Publication number: CN118277456B
Application number: CN202410704629.8A
Authority: CN
Inventors: 魏飞; 李春华
Original assignee: Tianjin Nankai University General Data Technologies Co ltd
Current assignee: Tianjin Nankai University General Data Technologies Co ltd
Priority date: 2024-06-03
Filing date: 2024-06-03
Publication date: 2024-09-20
Anticipated expiration: 2044-06-03
Also published as: CN118277456A

Abstract

The invention provides an originating node output method in an MPP distributed system, which relates to the technical field of databases, and comprises the following steps: collecting the number of nodes and the node IP in the database cluster, and sequencing according to the fragmentation information to obtain a first mapping relation of the node fragmentation and the node IP; collecting server information, creating an IP distance matrix of a computing node and a management node, and obtaining the distance matrix; distributing a plurality of hash buckets for each computing node, recording the corresponding relation between the computing node and the hash buckets, and acquiring a second mapping relation between the node fragments and the computing nodes according to nodedatamap table information and the corresponding relation; extracting hash keys in the filtering conditions of the query statement, and obtaining node IP of the computing node according to the hash key query; the distance matrix queries minimum values of the calculation node and the management node according to the node IP query result, and the corresponding node is output as the initiating node.

Description

A method for outputting initiating nodes in an MPP distributed system

技术领域Technical Field

本发明涉及数据库技术领域，尤其涉及一种MPP分布式系统中的发起节点输出方法。The present invention relates to the field of database technology, and in particular to an initiating node output method in an MPP distributed system.

背景技术Background Art

在一个MPP分布式系统中，通常包含管理节点和计算节点，管理节点是任务发起节点，负责任务的计划和调度，并接收从计算节点返回的数据；计算节点是数据所在节点，负责任务的执行，并将需要返回的数据返回给发起节点；所以发起节点相当于总调度师，发挥重要的作用。In an MPP distributed system, it usually includes management nodes and computing nodes. The management node is the task initiating node, which is responsible for planning and scheduling tasks and receiving data returned from the computing nodes. The computing nodes are the nodes where the data is located, which are responsible for executing tasks and returning the data that needs to be returned to the initiating node. Therefore, the initiating node is equivalent to the chief dispatcher and plays an important role.

当在网络负载不重，并发任务数不高时，选择哪个管理节点作为任务的发起节点，似乎不太关键，而对于业务负载量大，高并发场景下，网络带宽被挤压，跨网络的通讯网络延迟高，网络就很容易成为性能瓶颈，从而造成任务积压，不能及时完成场景业务，这时候如何正确的选择任务的发起节点就显得至关重要。When the network load is not heavy and the number of concurrent tasks is not high, it does not seem to be critical to choose which management node to use as the initiating node for the task. However, in scenarios with heavy business load and high concurrency, the network bandwidth is squeezed and the cross-network communication network latency is high. The network can easily become a performance bottleneck, resulting in a backlog of tasks and the inability to complete the scenario business in a timely manner. At this time, it is crucial to correctly select the initiating node for the task.

以往DBA在选择任务发起节点时，通常会采用随机选择或顺序选择的方式，但是这两种方法都没有考虑网络对于性能的影响，使得sql不能运行在最佳的发起节点上，可能会导致由于网络瓶颈带来的性能的异常，影响用户使用。In the past, DBAs usually used random selection or sequential selection to select task initiation nodes. However, neither of these two methods considered the impact of the network on performance, which prevented SQL from running on the best initiation node. This may cause performance anomalies due to network bottlenecks, affecting user usage.

发明内容Summary of the invention

本发明旨在至少解决相关技术中存在的技术问题之一。为此，本发明提供一种MPP分布式系统中的发起节点输出方法。The present invention aims to solve at least one of the technical problems existing in the related art. To this end, the present invention provides an initiating node output method in an MPP distributed system.

本发明提供一种MPP分布式系统中的发起节点输出方法，包括：The present invention provides an initiating node output method in an MPP distributed system, comprising:

S1：采集数据库集群中的节点数量及节点IP，根据分片信息进行排序获得节点分片及节点IP的第一映射关系；S1: Collect the number of nodes and node IPs in the database cluster, sort them according to the sharding information to obtain the first mapping relationship between the node shards and the node IPs;

S2：采集服务器信息，根据所述服务器信息创建计算节点与管理节点的IP距离矩阵，获得距离矩阵；S2: Collect server information, create an IP distance matrix between computing nodes and management nodes according to the server information, and obtain a distance matrix;

S3：为每个计算节点分配多个哈希桶，通过管理节点的nodedatamap表记录计算节点与哈希桶的对应关系，并根据nodedatamap表信息及所述对应关系获取节点分片与计算节点的第二映射关系；S3: Allocate multiple hash buckets to each computing node, record the correspondence between the computing nodes and the hash buckets through the nodedatamap table of the management node, and obtain the second mapping relationship between the node shards and the computing nodes according to the nodedatamap table information and the correspondence;

S4：提取查询语句的过滤条件中的哈希键，并由所述第二映射关系及所述第一映射关系，根据所述哈希键查询获得所述查询语句对应的计算节点的节点IP，获得节点IP查询结果；S4: extracting a hash key in the filter condition of the query statement, and obtaining the node IP of the computing node corresponding to the query statement according to the hash key query based on the second mapping relationship and the first mapping relationship, and obtaining a node IP query result;

S5：由所述距离矩阵，根据所述节点IP查询结果查询获得计算节点和管理节点极小值，将极小值对应的节点作为发起节点输出。S5: Obtain the minimum values of computing nodes and management nodes from the distance matrix according to the node IP query result, and output the node corresponding to the minimum value as the initiating node.

根据本发明提供的一种MPP分布式系统中的发起节点输出方法，步骤S1中的所述第一映射关系为一维数组。According to an initiating node output method in an MPP distributed system provided by the present invention, the first mapping relationship in step S1 is a one-dimensional array.

根据本发明提供的一种MPP分布式系统中的发起节点输出方法，步骤S2中的所述距离矩阵为二维矩阵。According to an initiating node output method in an MPP distributed system provided by the present invention, the distance matrix in step S2 is a two-dimensional matrix.

根据本发明提供的一种MPP分布式系统中的发起节点输出方法，所述距离矩阵中，当计算节点的IP与管理节点的IP相同时，计算节点与管理节点的距离为0，当计算节点的IP与管理节点的IP不同时，计算节点与管理节点的距离为1。According to an initiating node output method in an MPP distributed system provided by the present invention, in the distance matrix, when the IP of the computing node is the same as the IP of the management node, the distance between the computing node and the management node is 0; when the IP of the computing node is different from the IP of the management node, the distance between the computing node and the management node is 1.

根据本发明提供的一种MPP分布式系统中的发起节点输出方法，步骤S3中，为每个计算节点分配多个哈希桶时，分配方式为根据哈希桶总数均匀分配。According to an initiating node output method in an MPP distributed system provided by the present invention, in step S3, when multiple hash buckets are allocated to each computing node, the allocation method is uniform allocation according to the total number of hash buckets.

根据本发明提供的一种MPP分布式系统中的发起节点输出方法，所述哈希桶总数为65536个。According to an initiating node output method in an MPP distributed system provided by the present invention, the total number of hash buckets is 65536.

根据本发明提供的一种MPP分布式系统中的发起节点输出方法，步骤S4进一步包括：According to an initiating node output method in an MPP distributed system provided by the present invention, step S4 further comprises:

S41：提取查询语句的过滤条件中分布列的哈希值；S41: extracting the hash value of the distribution column in the filter condition of the query statement;

S42：根据哈希值计算获得哈希键；S42: Obtain a hash key according to the hash value calculation;

S43：由所述第二映射关系，根据所述哈希键获取查询语句对应的数据所在分片；S43: Obtaining the shard where the data corresponding to the query statement is located according to the hash key based on the second mapping relationship;

S44：由所述第一映射关系根据所述数据所在分片获取计算节点对应的节点IP，获得节点IP查询结果。S44: Obtain the node IP corresponding to the computing node according to the shard where the data is located by using the first mapping relationship to obtain a node IP query result.

根据本发明提供的一种MPP分布式系统中的发起节点输出方法，步骤S4中，所述过滤条件为含有where条件且存在等式分布列的精确查询语句。According to an initiating node output method in an MPP distributed system provided by the present invention, in step S4, the filtering condition is a precise query statement containing a where condition and having an equality distribution column.

本发明提供的一种MPP分布式系统中的发起节点输出方法，用于解决当并发压力大，网络负载重时，由于不正确的选择造成网络负载加剧问题，该方法能够根据MPP特点，自动选择最优发起节点，保证所选择的发起节点与计算节点为同一台机器，节省网络开销，避免了由于不正确的选择发起节点，而造成的性能异常，使用户业务更快速完成，提升用户体验及使用效率。The present invention provides an initiating node output method in an MPP distributed system, which is used to solve the problem of aggravated network load due to incorrect selection when the concurrency pressure is large and the network load is heavy. The method can automatically select the optimal initiating node according to the characteristics of MPP, ensure that the selected initiating node and the computing node are the same machine, save network overhead, avoid performance anomalies caused by incorrect selection of initiating nodes, enable user services to be completed more quickly, and improve user experience and usage efficiency.

本发明的附加方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the present invention will be given in part in the following description and in part will be obvious from the following description, or will be learned through practice of the present invention.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the present invention or the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1是本发明提供的一种MPP分布式系统中的发起节点输出方法流程图。FIG1 is a flow chart of an initiating node output method in an MPP distributed system provided by the present invention.

具体实施方式DETAILED DESCRIPTION

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明中的附图，对本发明中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。以下实施例用于说明本发明，但不能用来限制本发明的范围。In order to make the purpose, technical scheme and advantages of the present invention clearer, the technical scheme of the present invention will be clearly and completely described below in conjunction with the drawings in the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments in the present invention, all other embodiments obtained by ordinary technicians in the field without creative work are within the scope of protection of the present invention. The following embodiments are used to illustrate the present invention, but cannot be used to limit the scope of the present invention.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明实施例的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" etc. means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the embodiments of the present invention. In this specification, the schematic representations of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described may be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art may combine and combine the different embodiments or examples described in this specification and the features of the different embodiments or examples, without contradiction.

下面结合图1描述本发明的实施例。An embodiment of the present invention is described below with reference to FIG. 1 .

其中，步骤S1中的所述第一映射关系为一维数组。Wherein, the first mapping relationship in step S1 is a one-dimensional array.

本阶段中，首先获取分片和ip的映射关系，具体来讲获取数据库集群中的节点数量及ip信息，并按分片信息进行排序，定义N列一维数组A，其中的N为计算节点个数，表达式为A[0]=ip1，A[1]=ip2…A[N-1]=ipN。In this stage, we first obtain the mapping relationship between shards and IP addresses. Specifically, we obtain the number of nodes and IP addresses in the database cluster and sort them by shard information. We define an N-column one-dimensional array A, where N is the number of computing nodes. The expression is A[0]=ip1, A[1]=ip2…A[N-1]=ipN.

其中，步骤S2中的所述距离矩阵为二维矩阵。Wherein, the distance matrix in step S2 is a two-dimensional matrix.

其中，所述距离矩阵中，当计算节点的IP与管理节点的IP相同时，计算节点与管理节点的距离为0，当计算节点的IP与管理节点的IP不同时，计算节点与管理节点的距离为1。In the distance matrix, when the IP of the computing node is the same as the IP of the management node, the distance between the computing node and the management node is 0; when the IP of the computing node is different from the IP of the management node, the distance between the computing node and the management node is 1.

本阶段的目的为获得计算节点和发起节点的距离矩阵，具体来讲首先根据服务器管理员提供的服务器位置信息或者ip信息，定义N*N的二维矩阵M，然后创建计算节点和管理节点ip距离矩阵，ip相同时，距离为0，ip不同时，距离为1。The purpose of this stage is to obtain the distance matrix between the computing node and the initiating node. Specifically, first define an N*N two-dimensional matrix M based on the server location information or IP information provided by the server administrator, and then create an IP distance matrix between the computing node and the management node. If the IPs are the same, the distance is 0, and if the IPs are different, the distance is 1.

进一步的，本阶段目的为获得分片与计算节点的关系， MPP集群N个计算节点，id范围为n1-nN，每个节点分配65536/N个hash桶，hash桶id为h0-h65535。Furthermore, the purpose of this stage is to obtain the relationship between shards and computing nodes. The MPP cluster has N computing nodes, with ids ranging from n1 to nN. Each node is allocated 65536/N hash buckets, and the hash bucket ids are h0-h65535.

而MPP发起节点中的nodedatamap（系统表，用于记录节点分片与数据的映射），记录分片和hash桶的关系，每个node上对应多个hash桶，之后我们通过nodedatamap中信息即能获得分片与计算节点ip的关系。The nodedatamap (system table used to record the mapping between node shards and data) in the MPP initiating node records the relationship between shards and hash buckets. Each node corresponds to multiple hash buckets. Then we can obtain the relationship between the shards and the computing node IP through the information in the nodedatamap.

其中，步骤S3中，为每个计算节点分配多个哈希桶时，分配方式为根据哈希桶总数均匀分配。In step S3, when multiple hash buckets are allocated to each computing node, the allocation method is to evenly allocate according to the total number of hash buckets.

其中，所述哈希桶总数为65536个。Among them, the total number of hash buckets is 65536.

在一些实施例中，GBase数据库对应的特征为65536个哈希桶。In some embodiments, the GBase database corresponds to 65536 hash buckets.

其中，步骤S4进一步包括：Wherein, step S4 further comprises:

其中，步骤S4中，所述过滤条件为含有where条件且存在等式分布列的精确查询语句。Wherein, in step S4, the filtering condition is a precise query statement containing a where condition and having an equal distribution column.

本阶段中，首先提取sql中过滤条件中的hash值v，确定该sql计算节点的ip，具体的在获取到sql后，找到精确查询过滤条件中的hash分布列的值v，标志是含有where条件且存在分布列’=’，找到值v之后进行crc32（v）%65536计算出hashkey，其中的crc为基于crc检验原理的加解密算法，之后从步骤S3中的nodedatamap信息中获取分片和hashkey的关系得到数据所在分片，根据步骤S1中得到的分片和ip映射关系，得到计算节点ip。In this stage, first extract the hash value v in the filter condition in SQL and determine the IP of the SQL computing node. Specifically, after obtaining SQL, find the value v of the hash distribution column in the precise query filter condition. The sign is that it contains the where condition and the distribution column '=' exists. After finding the value v, perform crc32(v)%65536 to calculate the hashkey, where crc is an encryption and decryption algorithm based on the crc verification principle. Then, obtain the relationship between the shard and the hashkey from the nodedatamap information in step S3 to obtain the shard where the data is located. According to the shard and IP mapping relationship obtained in step S1, the computing node IP is obtained.

本阶段目的为获得最优发起节点ip，根据步骤S4得到的计算节点ip，查找步骤S2中得到的计算节点和发起节点距离矩阵的最小值，min(M[k,i]),k为计算节点ip的分片值，i为0......N-1,将该M节点作为计算节点输出。The purpose of this stage is to obtain the optimal initiating node IP. According to the computing node IP obtained in step S4, the minimum value of the distance matrix between the computing node and the initiating node obtained in step S2 is found, min(M[k,i]), k is the shard value of the computing node IP, i is 0...N-1, and the M node is output as the computing node.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An originating node output method in an MPP distributed system, comprising:

s1: collecting the number of nodes and the node IP in the database cluster, and sequencing according to the fragmentation information to obtain a first mapping relation of the node fragmentation and the node IP;

s2: collecting server information, creating an IP distance matrix of a computing node and a management node according to the server information, and obtaining a distance matrix;

s3: a plurality of hash buckets are distributed for each computing node, the corresponding relation between the computing node and the hash buckets is recorded through a nodedatamap table of a management node, and a second mapping relation between the node fragments and the computing node is obtained according to nodedatamap table information and the corresponding relation, specifically comprising:

S31: assigning a plurality of hash buckets to each computing node;

s32: recording the corresponding relation between the computing node and the hash bucket through nodedatamap tables of the management nodes;

S33: recording the corresponding relation between the node fragments and the hash bucket through a nodedatamap table of the MPP initiating node;

S34: obtaining a hash bucket corresponding to a computing node through a nodedatamap table of a management node, obtaining node fragments corresponding to the hash bucket through a nodedatamap table of an MPP initiating node, and obtaining a second mapping relation between the node fragments and the computing node according to the hash bucket corresponding to the computing node and the node fragments corresponding to the hash bucket;

S4: extracting a hash key in a filtering condition of a query statement, and obtaining a node IP of a computing node corresponding to the query statement according to the hash key query by using the second mapping relation and the first mapping relation to obtain a node IP query result, wherein the method specifically comprises the following steps of:

S41: extracting hash values of the distributed columns in the filtering conditions of the query statement;

s42: calculating according to the hash value to obtain a hash key;

s43: obtaining node fragments corresponding to hash keys through a nodedatamap table of an MPP initiating node according to the second mapping relation, and obtaining the fragments corresponding to the query statement data according to the hash keys;

S44: acquiring node IP corresponding to a computing node according to the fragments of the data by the first mapping relation, and acquiring a node IP query result;

s5: and inquiring and obtaining minimum values of the calculation node and the management node according to the node IP inquiry result by the distance matrix, and outputting the node corresponding to the minimum value as an initiating node.

2. The method as claimed in claim 1, wherein the first mapping relation in step S1 is a one-dimensional array.

3. The method according to claim 1, wherein the distance matrix in step S2 is a two-dimensional matrix.

4. The method as set forth in claim 3, wherein in the distance matrix, when the IP of the computing node is the same as the IP of the management node, the distance between the computing node and the management node is 0, and when the IP of the computing node is different from the IP of the management node, the distance between the computing node and the management node is 1.

5. The method as set forth in claim 1, wherein in step S3, when a plurality of hash buckets are allocated to each computing node, the allocation is performed uniformly according to the total number of hash buckets.

6. An originating node output method in an MPP distributed system as defined in claim 5, wherein the hash buckets total 65536.

7. The method as set forth in claim 1, wherein in step S4, the filtering condition is an exact query statement containing a where condition and having an equation distribution column.