CN104899250A

CN104899250A - Graph calculation scaling method based on separation of graph structure information and data information

Info

Publication number: CN104899250A
Application number: CN201510229191.3A
Authority: CN
Inventors: 陈海波; 陈榕; 臧斌宇; 管海兵
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2015-05-07
Filing date: 2015-05-07
Publication date: 2015-09-09
Anticipated expiration: 2035-05-07
Also published as: CN104899250B

Abstract

The present invention provides a graph computing scaling method based on the separation of graph structure and data information, which is characterized in that it includes: Step 1: Separating and migrating graph structure information and data information; Step 2: Optimizing locality sensitivity and load balancing distribute. The present invention adopts a method for separating and migrating graph structure and data information, and an optimal allocation mechanism of locality sensitivity and load balancing to realize the expansion and contraction of the graph computing system. Compared with the elastic scaling support at the existing system level (such as virtual machine migration and other technologies), the present invention can reduce the loss of upper-layer application service performance, and shorten the overall migration time and service interruption time in the elastic migration process of the computing system. Improve the resource utilization of cloud computing clusters and the availability of upper-layer application services, and promote the deployment of key computing services with high-throughput requirements in cloud computing data centers, which can bring considerable social and economic benefits.

Description

Graph Computing Scaling Method Based on Separation of Graph Structure and Data Information

技术领域technical field

本发明涉及分布式计算技术领域。具体的说，涉及基于图结构与数据信息分离的图计算伸缩方法The invention relates to the technical field of distributed computing. Specifically, it involves a graph computing scaling method based on the separation of graph structure and data information

背景技术Background technique

弹性计算是云平台中重要的特质，直接关系到云平台上应用的执行效率。现有的解决方案以系统级的弹性计算为主，应用层的支持主要有Map Reduce模型的系统，如Hadoop。计算应用迁移的性能主要取决于迁移速度、服务中断时间以及恢复计算以后系统的性能。当前主流的弹性伸缩技术中没有考虑图计算应用伸缩的需求，导致图计算的伸缩对系统性能造成的损失较大。Elastic computing is an important characteristic of the cloud platform, which is directly related to the execution efficiency of applications on the cloud platform. Existing solutions are mainly based on system-level elastic computing, and the support of the application layer mainly includes systems of the Map Reduce model, such as Hadoop. The performance of computing application migration mainly depends on the migration speed, service interruption time, and system performance after computing is resumed. The current mainstream elastic scaling technology does not consider the scaling requirements of graph computing applications, resulting in a large loss of system performance caused by graph computing scaling.

因此，如何针对图计算应用伸缩的特点，合理设计迁移策略和方案，减少伸缩过程中的网络传输，提高传输后图计算的本地性，实已成为本领域技术人员亟待解决的技术难题。Therefore, how to properly design migration strategies and solutions according to the scaling characteristics of graph computing applications, reduce network transmission during the scaling process, and improve the locality of graph computing after transmission has become a technical problem to be solved urgently by those skilled in the art.

发明内容Contents of the invention

针对现有技术中的缺陷，本发明的目的在于，利用对上层计算应用的了解，通过图结构与数据信息分离迁移，以及本地性敏感和负载均衡的优化分配机制，缩短迁移时间和中断时间，提升完成伸缩后系统的性能。In view of the defects in the prior art, the purpose of the present invention is to shorten the migration time and interruption time by utilizing the understanding of upper-layer computing applications, separating and migrating the graph structure and data information, and optimizing the allocation mechanism of locality sensitivity and load balancing. Improve the performance of the system after scaling.

根据本发明提供的一种基于图结构与数据信息分离的图计算伸缩方法包括如下步骤：A graph computing scaling method based on the separation of graph structure and data information provided by the present invention includes the following steps:

步骤1：对图结构信息与数据信息进行分离迁移；Step 1: Separate and migrate graph structure information and data information;

步骤2：本地性敏感和负载均衡的优化分配。Step 2: Optimized distribution for locality sensitivity and load balancing.

优选地，所述步骤1，包括如下步骤：Preferably, said step 1 includes the following steps:

将所述图结构信息与计算过程中的所述数据信息分离；separating the graph structure information from the data information in the calculation process;

由目标机器通过分布式文件系统直接访问计算过程中不变的所述图结构信息，而由源机器通过网络传输计算过程中产生的所述数据信息。The target machine directly accesses the invariant graph structure information in the calculation process through the distributed file system, and the source machine transmits the data information generated in the calculation process through the network.

载入图；在载入图的过程中，划分并构建子图，记录图的划分方式，完成子图构建后将划分信息存储至分布式文件系统；Load the graph; in the process of loading the graph, divide and construct the subgraph, record the division method of the graph, and store the partition information in the distributed file system after the subgraph is constructed;

在接收到迁移指令时，作为目标机器的迁入服务器将属于迁出服务器的子图结构直接通过分布式文件系统载入，并与迁入服务器上原有的子图合并；作为源机器的迁出服务器向迁入服务器发送关于迁出子图的计算状态；When receiving the migration instruction, the migration-in server as the target machine directly loads the subgraph structure belonging to the migration-out server through the distributed file system, and merges it with the original subgraph on the migration-in server; the migration-out server as the source machine The server sends the computing status of the outgoing subgraph to the incoming server;

迁入服务器接收计算状态，向迁出服务器发送回执信息，迁出服务器关闭应用进程；迁入服务器载入计算状态，恢复计算。The migrating-in server receives the computing state, sends a receipt message to the migrating-out server, and the migrating-out server closes the application process; the migrating-in server loads the computing state and resumes computing.

优选地，所述步骤2包括如下步骤：Preferably, said step 2 includes the following steps:

在载入图的过程中，根据所述计算数据的本地性和系统的负载均衡对服务器进行划分，并对服务器的划分结果提取划分特征；In the process of loading the graph, the server is divided according to the locality of the calculation data and the load balance of the system, and the division feature is extracted from the division result of the server;

在接收到迁移指令时，根据所述划分特征计算迁出子图结构迁移到不同迁入服务器的本地性得分，在选出的本地性得分靠前的多个服务器中根据服务器的负载情况选择负载最轻的服务器作为迁入服务器，执行迁移；When the migration instruction is received, calculate the locality score of the migration of the outgoing subgraph structure to different incoming servers according to the division characteristics, and select the load according to the load of the server among the selected servers with the highest locality scores The lightest server is used as the migration server to perform the migration;

在迁入服务器完成图的重建之后，重新计算服务器上的划分特征。After the migrating server completes the reconstruction of the graph, the partitioning features on the server are recalculated.

与现有技术相比，本发明具有如下的有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1、本发明主要提出了一种利用优化的伸缩策略和算法实现图计算的弹性伸缩方法。该方法通过利用图结构与数据信息分离迁移以及本地性敏感和负载均衡的优化分配机制，缩短整体迁移时间和服务中断时间，并优化伸缩完成后的系统性能，提升迁移完成后系统的数据本地性和负载均衡。1. The present invention mainly proposes an elastic scaling method that uses optimized scaling strategies and algorithms to realize graph computing. This method shortens the overall migration time and service interruption time by using the graph structure and data information separation and migration, as well as the locality-sensitive and load-balanced optimization allocation mechanism, and optimizes the system performance after the scaling is completed, and improves the data locality of the system after the migration is completed. and load balancing.

2、本发明相对现有系统级(如借助虚拟机迁移等技术)的弹性缩放支持，能够降低上层应用服务性能的损失，缩短计算系统弹性迁移过程中的整体迁移时间和服务中断时间。提升云计算集群资源利用率以及上层应用服务的可用性，促进具有高通量需求的关键计算服务在云计算数据中心的部署，进而可以带来可观的社会效益及经济效益。2. Compared with the elastic scaling support at the existing system level (such as virtual machine migration and other technologies), the present invention can reduce the loss of upper-layer application service performance and shorten the overall migration time and service interruption time in the elastic migration process of the computing system. Improve the resource utilization of cloud computing clusters and the availability of upper-layer application services, and promote the deployment of key computing services with high-throughput requirements in cloud computing data centers, which can bring considerable social and economic benefits.

3、伸缩过程中信息传输量大的问题主要来源于图计算中图结构的信息发送，图计算依赖于图的结构，但是图的结构在计算过程中不发生改变。因此本发明提出将图的结构信息与数据信息分离迁移的方法，减少迁移过程中的网络传输，缩短整体迁移时间。3. The problem of large amount of information transmission during the scaling process mainly comes from the information transmission of the graph structure in graph computing. Graph computing depends on the structure of the graph, but the structure of the graph does not change during the computing process. Therefore, the present invention proposes a method for separating and migrating graph structure information and data information, reducing network transmission during the migration process, and shortening the overall migration time.

4、图计算中对计算性能影响最大的是数据的本地性以及系统的均衡负载。图计算的特点是循环、重复的，良好的数据本地性可以大大减少网络传输，提高计算性能。本发明提出本地性敏感和负载均衡的优化分配机制，在迁移过程中根据迁入与迁出服务器的数据分布，选择合理的迁移方案。4. In graph computing, the biggest impact on computing performance is the locality of data and the balanced load of the system. Graph computing is characterized by cycles and repetitions. Good data locality can greatly reduce network transmission and improve computing performance. The invention proposes an optimal allocation mechanism of locality sensitivity and load balance, and selects a reasonable migration scheme according to the data distribution of the migrating-in and migrating-out servers during the migration process.

附图说明Description of drawings

通过阅读参照以下附图对非限制性实施例所作的详细描述，本发明的其它特征、目的和优点将会变得更明显：Other characteristics, objects and advantages of the present invention will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

图1是基于图结构与数据信息分离的图计算动弹性迁移流程图。Figure 1 is a flow chart of graph computing dynamic elastic migration based on the separation of graph structure and data information.

图2是基于图结构与数据信息分离的图计算动弹性迁移后的系统状态。Figure 2 is the system state after dynamic elastic migration based on the separation of graph structure and data information.

具体实施方式Detailed ways

下面结合具体实施例对本发明进行详细说明。以下实施例将有助于本领域的技术人员进一步理解本发明，但不以任何形式限制本发明。应当指出的是，对本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变化和改进。这些都属于本发明的保护范围。The present invention will be described in detail below in conjunction with specific embodiments. The following examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It should be noted that those skilled in the art can make several changes and improvements without departing from the concept of the present invention. These all belong to the protection scope of the present invention.

本发明采用图结构与数据信息分离迁移的方法，以及本地性敏感和负载均衡的优化分配机制，实现图计算系统的伸缩。The present invention adopts a method for separating and migrating graph structure and data information, and an optimal allocation mechanism of locality sensitivity and load balancing to realize the expansion and contraction of the graph computing system.

以下将通过具体实施示例来描述本发明提供的图计算伸缩技术。The graph computing scaling technology provided by the present invention will be described below through specific implementation examples.

图1是基于图结构与数据信息分离的图计算动弹性迁移流程图(以扩张为例)Figure 1 is a graph calculation dynamic elastic migration flow chart based on the separation of graph structure and data information (taking expansion as an example)

步骤0，系统载入原始数据，划分并构建子图，完成构建后将划分信息存储至分布式文件系统。Step 0, the system loads the original data, divides and constructs subgraphs, and stores the division information in the distributed file system after the construction is completed.

步骤1，系统收到迁移指令(由迁出服务器A迁往迁入服务器B、C)，A向B、C同时发送数据信息(如1a所示)，同时B、C通过分布式文件系统读取图信息(如1b所示)。Step 1, the system receives the migration command (from the migration server A to the migration server B, C), A sends data information to B and C at the same time (as shown in 1a), and B and C read the data through the distributed file system Get image information (as shown in 1b).

步骤2，迁入服务器B、C完成接收，迁出服务器A应用关闭，B、C重新构建图结构和数据信息。Step 2: The migrating servers B and C complete the reception, the migrating server A closes the application, and B and C rebuild the graph structure and data information.

最终系统的状态如图2。The state of the final system is shown in Figure 2.

进一步具体地，基于图的本地性和负载均衡的优化分配机制的算法Further specifically, an algorithm based on an optimal distribution mechanism based on graph locality and load balancing

在步骤0中，每台服务器根据其处理的子图A产生本地性得分评价函数，该函数的输入参数为一个子图B，该函数输出对子图B迁入子图A的评价，本地性好的得分高。In step 0, each server generates a locality score evaluation function according to the subgraph A it processes. The input parameter of this function is a subgraph B, and the output of this function is the evaluation of the migration of subgraph B into subgraph A. Locality Good scores high.

在步骤1中，当服务器需要迁出时分别产生每台服务器对子图的评分，选取得分最高的2台，根据其负载，选择负载小的一台服务器为迁移目标。In step 1, when the server needs to be migrated out, the scores of each server on the subgraph are generated respectively, and the two with the highest scores are selected. According to their load, a server with a small load is selected as the migration target.

综上所述，本发明提出的基于图结构与数据信息分离的图计算伸缩技术，采用图结构与数据信息分离迁移的方法，以及本地性敏感和负载均衡的优化分配机制，能够缩短迁移过程中的迁移时间和服务中断时间，提升迁移完成后系统的数据本地性和负载均衡。To sum up, the graph computing scaling technology based on the separation of graph structure and data information proposed by the present invention adopts the method of separating graph structure and data information for migration, as well as the optimized allocation mechanism of locality sensitivity and load balancing, which can shorten the migration process. The migration time and service interruption time are shortened, and the data locality and load balancing of the system after the migration is completed are improved.

其中，将所述图结构信息与计算过程中的所述数据信息分离，由目标机器通过分布式文件系统直接访问计算过程中不变的所述图结构信息，而由源机器通过网络传输计算过程中产生的所述数据信息；避免传输计算过程中不发生改变的数据，减少对迁移过程中需要传输的数据，缩短伸缩时间。Wherein, the graph structure information is separated from the data information in the calculation process, the target machine directly accesses the graph structure information unchanged in the calculation process through the distributed file system, and the source machine transmits the calculation process through the network The data information generated in the above; avoid the transmission of data that does not change during the calculation process, reduce the data that needs to be transmitted during the migration process, and shorten the scaling time.

以上对本发明的具体实施例进行了描述。需要理解的是，本发明并不局限于上述特定实施方式，本领域技术人员可以在权利要求的范围内做出各种变化或修改，这并不影响本发明的实质内容。Specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the specific embodiments described above, and those skilled in the art may make various changes or modifications within the scope of the claims, which do not affect the essence of the present invention.

Claims

1. A graph computing stretching method based on graph structure and data information separation, is characterized in that, comprises the steps:

Step 1: Separate and migrate graph structure information and data information;

Step 2: Optimized distribution for locality sensitivity and load balancing.

2. The graph computing stretching method based on the separation of graph structure and data information according to claim 1, wherein the step 1 comprises the following steps:

separating the graph structure information from the data information in the calculation process;

The target machine directly accesses the invariant graph structure information in the calculation process through the distributed file system, and the source machine transmits the data information generated in the calculation process through the network.

3. The graph computing scaling method based on the separation of graph structure and data information according to claim 1, wherein the step 1 comprises the following steps:

Load the graph; in the process of loading the graph, divide and construct the subgraph, record the division method of the graph, and store the partition information in the distributed file system after the subgraph is constructed;

When receiving the migration instruction, the migration-in server as the target machine directly loads the subgraph structure belonging to the migration-out server through the distributed file system, and merges it with the original subgraph on the migration-in server; the migration-out server as the source machine The server sends the computing status of the outgoing subgraph to the incoming server;

The migrating-in server receives the computing state, sends a receipt message to the migrating-out server, and the migrating-out server closes the application process; the migrating-in server loads the computing state and resumes computing.

4. The graph computing scaling method based on the separation of graph structure and data information according to claim 3, wherein the step 2 comprises the following steps:

In the process of loading the graph, the server is divided according to the locality of the calculation data and the load balance of the system, and the division feature is extracted from the division result of the server;

When the migration instruction is received, calculate the locality score of the migration of the outgoing subgraph structure to different incoming servers according to the division characteristics, and select the load according to the load of the server among the selected servers with the highest locality scores The lightest server is used as the migration server to perform the migration;

After the migrating server completes the reconstruction of the graph, the partitioning features on the server are recalculated.