CN116627661A

CN116627661A - Method and system for scheduling computing power resources

Info

Publication number: CN116627661A
Application number: CN202310906489.8A
Authority: CN
Inventors: 王羽中; 蒋咪; 陈雪儿; 才振功; 吉梁茜; 王翱宇
Original assignee: Hangzhou Harmonycloud Technology Co Ltd
Current assignee: Hangzhou Harmonycloud Technology Co Ltd
Priority date: 2023-07-24
Filing date: 2023-07-24
Publication date: 2023-08-22
Anticipated expiration: 2043-07-24
Also published as: CN116627661B

Abstract

The invention discloses a method and a system for scheduling computing power resources, which belong to the technical field of electric digital data processing, and the method comprises the following steps: responding to a computing power resource allocation instruction of a cloud, and establishing an application container in a target edge node, wherein the resource allocation instruction comprises logic resources of the distributed computing power resources and application container deployment information; and obtaining the physical resources corresponding to the allocated logical resources through the side car of the application container, and mounting the driving files of the physical resources into the application container. The computing power resources at the edge side are uniformly managed, so that the resource utilization efficiency is improved, and the enterprise cost is reduced; the type and the position of the computing power resource do not need to be specified, corresponding computing tasks and application containers are scheduled to a target edge node for execution according to the computing power resource demand of the service request, the management and the scheduling of the computing power resource at the edge side are realized, the utilization rate of the computing power resource is improved, the use cost is reduced, and the development efficiency of the application is improved.

Description

Method and system for scheduling computing power resources

Technical Field

The invention relates to the technical field of electric digital data processing, in particular to a method and a system for scheduling computing power resources.

Background

At present, artificial intelligence has become one of strategic technologies, not only is technology innovation, but also is an important driving force for promoting economic development, social progress and industry innovation. As an important component in the artificial intelligence market, the computing resources mainly including GPU, NPU, FPGA and the like are also vigorously developed, AI computing resources of various types and models are layered endlessly, more computing power is distributed on edges, and computing power resources distributed on a plurality of edges form heterogeneous computing power resources. How to realize unified management and efficient scheduling of heterogeneous computing resources of various types has been a hotspot in academic and industrial research.

Because of the lack of efficient and economical heterogeneous computing power management and scheduling solutions, most enterprises can only exclusively use expensive AI computing power resources, and high AI computing power use cost is brought; and a large amount of computing power resources distributed on the edge side cannot be managed and effectively used, so that serious resource waste is caused. In addition, multiple hardware is deployed on the edge side of the heterogeneous computing power resource, and when designing the payment, the user needs to consider each hardware support, and the user has to modify the AI application to adapt to the AI computing power hardware of different manufacturers. This exacerbates AI application development deployment complexity and increases AI computing input costs.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides the method and the system for dispatching the computing power resources, which realize the management and dispatching of the computing power resources at the edge side, improve the utilization rate of the computing power resources and reduce the use cost.

The invention discloses a method for scheduling computing power resources, which comprises the following steps: responding to a computing power resource allocation instruction of a cloud, and establishing an application container in a target edge node, wherein the resource allocation instruction comprises logic resources of the distributed computing power resources and application container deployment information; and obtaining the physical resources corresponding to the allocated logical resources through the side car of the application container, and mounting the driving files of the physical resources into the application container.

Further, the management method of the computing power resource comprises the following steps: obtaining physical resources of an edge node; establishing logic resources and mapping of the physical resources and the logic resources according to the physical resources; uploading the logic resource to a cloud; and carrying out unit quantization on the computing power resources, wherein the computing power resources comprise computing power and memory.

Further, the method for obtaining the computing power resource allocation instruction comprises the following steps:

responding to the service request, analyzing the service request to obtain one or more computing tasks;

distributing target edge nodes and power resources according to the computing tasks and the residual power resources of the clusters;

and obtaining application container deployment information according to the type of the computing power resource, wherein the application container deployment information comprises container mirror images adapting to the computing power resource.

Further, the method for allocating edge nodes comprises the following steps: obtaining the computing power resource requirement of the computing task; obtaining an edge node list meeting the computing power resource requirement; and selecting a target edge node from the edge node list based on the computing power resource residual quantity and/or the network condition.

Further, the container deployment information further includes a mirror image of a daemon applying the container, and the method of pulling up the container or the group of containers includes:

injecting a sidecar for the application container or group of containers; starting a daemon according to the mirror image of the daemon; the application container or group of containers is pulled by the daemon. Further, a method for autonomy of the edge node;

and establishing a persistent database in the target edge node, wherein the persistent database is used for storing metadata received by a cloud, and the metadata is used for supporting the running of an application container.

Further, the method for multi-application parallel scheduling comprises the following steps:

slicing the physical resources of the target edge node into first logical resources and second logical resources,

scheduling the first application container and the second application container to a target edge node;

according to the calculation power requirements of a first application container and a second application container, respectively distributing first logic resources and second logic resources for the first application container and the second application container;

mounting driving files of corresponding physical resources for the first application container and the second application container respectively;

the first application container executes corresponding calculation tasks by using corresponding physical resource slices of the first logic resources, and the second application container executes calculation tasks by using corresponding physical resource slices of the second logic resources;

the isolation of the first application container and the second application container is performed based on any one of the following virtualization schemes: vCUDA, CANN, vGPU and cGPU;

and monitoring the physical resource use condition of the edge node and the physical resource use conditions of the first application container and the second application container.

The cloud-based computing power resource scheduling method is characterized by comprising the following steps of:

generating deployment information of an application container according to the computing task;

according to the logic resources of the allocated computing resources and the deployment information of the application container, sending computing resource allocation instructions to the target edge node; and sending the following mounting instructions to the target edge node: obtaining corresponding physical resources of the allocated logical resources through a side car of an application container, and mounting driving files of the physical resources into the application container;

and sending the mirror image of the daemon of the application container to the target edge node according to the type of the distributed computing power resource.

The invention also provides a system for realizing the method, which comprises a first transmission module and a container management module,

the first transmission module is used for receiving a computing power resource allocation instruction of the cloud, wherein the resource allocation instruction comprises logic resources of the distributed computing power resources and application container deployment information;

the container management module is used for establishing an application container and a side vehicle thereof in the target edge node, obtaining physical resources corresponding to the allocated logical resources through the side vehicle, and mounting driving files of the physical resources into the application container.

The first transport module and the container management module are deployed at a target edge node,

the target edge node is further provided with an algorithm management module, wherein the algorithm management module is used for discovering and registering physical resources, mapping management of the physical resources and logical resources, scheduling vCuda and CANN to realize isolation and limitation of the algorithm resources of each application container, and monitoring the use condition of each physical resource and the physical resource quantity used by each application container;

the cloud end is provided with a second transmission module, a scheduling module and a calculation control module,

the second transmission module is configured to interact with an edge node,

the scheduling module is used for distributing target edge nodes and computing power resources;

the computing power control module is used for quantifying computing power resources, sending the daemon mirror image address, injecting the connected vehicle into the container group, and sending the configuration information of the container group to the target edge node.

Compared with the prior art, the invention has the beneficial effects that: the computing power resources at the edge side are uniformly managed, so that the resource utilization efficiency is improved, and the enterprise cost is reduced; the type and the position of the computing power resource do not need to be specified, corresponding computing tasks and application containers are scheduled to a target edge node for execution according to the computing power resource demand of the service request, the management and the scheduling of the computing power resource at the edge side are realized, the utilization rate of the computing power resource is improved, the use cost is reduced, and the development efficiency of the application is improved.

Drawings

FIG. 1 is a flow chart of a method of scheduling computing power resources in accordance with the present invention;

fig. 2 is a logical block diagram of the power system of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention is described in further detail below with reference to the attached drawing figures:

a cloud edge collaboration-based computing power resource scheduling method, as shown in figure 1, comprises the following steps:

step 101: and responding to the service request, analyzing and splitting the service request to obtain one or more computing tasks. Each computing task has a certain computational power resource requirement.

Step 102: and distributing the target edge node and the computing power resources thereof according to the computing power resource requirements of the computing tasks and the remaining computing power resources of the clusters.

Step 103: and generating deployment information of the application container according to the calculation task. The deployment information includes the daemon's mirror address and configuration information for the application container.

Step 104: and sending a computing power resource allocation instruction to the target edge node according to the allocated computing power resources and the deployment information of the application container. And simultaneously, the metadata related to the computing task is issued to the target edge node. The cloud may also issue an application upgrade instruction to the target edge node.

Step 105: and sending the mirror image of the daemon of the application container to the target edge node according to the type of the distributed computing power resource.

Step 111: and responding to a computing power resource allocation instruction of the cloud, and establishing an application container in the target edge node through the daemon, wherein the resource allocation instruction comprises the allocated computing power resource and application container deployment information.

Step 112: and obtaining physical resources corresponding to the allocated computing resources through a side car (Sidecar) of the application container, and mounting driving files of the physical resources into the application container.

The Sidecar is an associated container of the application container, and a user only needs to develop a corresponding application, and the Sidecar can be injected in a cloud or an edge node.

Step 113: the application container performs corresponding computing tasks by utilizing the physical resources. Wherein steps 101-105 are performed in edge nodes on the edge side and steps 111-113 are performed in management nodes in the cloud.

The computing power resources at the edge side are uniformly managed, a large amount of scattered heterogeneous computing power is utilized, the operation of computing tasks is quickened, the utilization efficiency of the resources is improved, and the enterprise cost is reduced; the type, the position and the driving file of the computing power resource do not need to be specified, corresponding computing tasks and application containers are scheduled to the target edge node for execution according to the computing power resource demand of the service request, the management and the scheduling of the computing power resource at the edge side are realized, the utilization rate of the computing power resource is improved, the use cost is reduced, and the development efficiency of the application is improved.

The management method of the computing power resource comprises the following steps:

step 201: the physical resources of the edge node are obtained.

Step 202: and building logic resources and mapping of the physical resources and the logic resources according to the physical resources. One physical resource or hardware can establish one logic resource, and the physical resource can be cut to establish a plurality of logic resources, so that fine granularity management of the computing resources is realized.

Step 203: and uploading the logic resources, the total amount of the resources and the residual amount of the resources to a cloud.

Step 204: and carrying out unit quantization on the computing power resources, wherein the computing power resources comprise computing power and memory.

For example, the computational power unification is quantified by FLPs, and the memory is quantified by the memory size in M. Defining the quantized computing resources through a database table (CRD); and the management node of the cloud receives the states of the node and the application which are reported by the edge side at fixed time, and updates the states into the CRD.

In step 102, a method for allocating edge nodes includes:

step 301: and obtaining the computational power resource requirements of the computational tasks.

Step 302: obtaining a list of edge nodes meeting the computing power resource requirements.

Step 303: and selecting a target edge node from the edge node list based on the computing power resource residual quantity and/or the network condition. The target edge node can be selected through calculating the resource surplus, and the target edge node can also be selected through weighting sum of the resource surplus and the network condition.

For example, score=λ1×source_s1+λ2×source_s2- λ3×net/Av

Where Score is represented as a Score, λ1, λ2, and λ3 are weight coefficients, source_s1 and source_s2 are represented as a resource remaining amount, and resource types of both are different, net is represented as a network delay, and Av is represented as a threshold value of the network delay. And selecting the optimal edge node by maximizing the score.

In step 104, the container deployment information further includes a mirror address of a daemon that applies the container, and the method for pulling up the container or the group of containers includes: injecting a side vehicle for the application container or the container group through the cloud end, and issuing configuration information of the container or the container group to a target edge node; the target edge node starts the daemon according to the mirror image address of the daemon; the application container or group of containers is pulled by the daemon.

To reduce the impact of network disconnection on computing tasks, a persistent database (sqlite) may be established in the target edge node for holding metadata received by the cloud for supporting the running of application containers. Autonomy of the edge nodes is realized. When the network is disconnected, the edge node is abnormal or restarted, and the memory data is lost; and enabling the recovery of the application container through the persistence edge database, and ensuring the continuous normal operation of the application.

The multi-application parallel scheduling method comprises the following steps:

step 401: and slicing the physical resources of the target edge node into a first logic resource, a second logic resource and a third logic resource. The first logic resource and the second logic resource can be mapped to the same hardware or can be mapped to different hardware, and the third logic resource and the first logic resource are mapped to different hardware; the sizes of the first logic resource, the second logic resource and the third logic resource can be adjusted according to the requirement.

Step 402: the first application container and the second application container are dispatched to the target edge node.

Step 403: and respectively distributing the first logic resource and the second logic resource for the first application container and the second application container according to the calculation power requirements of the first application container and the second application container. The first application logical resource and the third logical resource may also be allocated to the first application container simultaneously.

Step 404: and mounting driving files of corresponding physical resources for the first application container and the second application container respectively.

Step 405: the first application container performs corresponding computing tasks by using the allocated physical resource slices, and the second application container performs computing tasks by using the physical resource slices corresponding to the second logical resources.

Step 406: the isolation of the first application container and the second application container is performed based on any one of the following virtualization schemes: vCUDA, CANN, vGPU and cGPU. The method ensures that no serious resource preemption exists between application containers and improves the stability of application.

Step 407: and monitoring the physical resource use condition of the edge node and the physical resource use conditions of the first application container and the second application container.

And the utilization rate of physical resources is improved through parallel operation of multiple applications. Multiple applications may be enabled to use one physical resource at the same time or one application may use multiple physical resources.

The present invention also provides a system for implementing the foregoing computing power resource scheduling method, as shown in fig. 2, including a first transmission module 21 (named EdgeCom) and a container management module 22 (named EdgeKL).

The first transmission module 21 is configured to receive a computing power resource allocation instruction of the cloud 1, where the resource allocation instruction includes a logic resource of the allocated computing power resource and application container deployment information, and is further configured to report information such as a server state of an edge node, a total amount of computing resources, a usage amount of computing resources, and a state of an application on the node at regular time.

The container management module 22 is configured to establish an application container 23 and a side cart 24 thereof in the target edge node 2, obtain a physical resource 26 corresponding to the allocated logical resource through the side cart 24 (named Computility Runtime), and mount a driving file of the physical resource 26 into the application container 23.

The first transmission module 21 and the container management module 22 are deployed at the edge node 2, the edge node 2 is further deployed with an algorithm management module 28 (named Computility Manager), the algorithm management module 28 is used for discovering and registering the physical resources 26, mapping management of the physical resources 26 and logical resources, scheduling vCuda and CANN to realize isolation and limitation of the algorithm resources of each application container, and monitoring the use condition of each physical resource and the physical resource quantity used by each application container; the container management module 22 is used to launch a daemon (Docker) 27 by which to pull up the application container 23 and the sidecar 24.

Specifically, the algorithm management module 28 deploys computility manager components in a DaemonSet mode of Kubernetes, and the bottom layer of the components interacts with the components such as vCuda, cuda, CANN.

The management node 11 of the cloud 1 is deployed with a second transmission module 12 (CloudCom), a scheduling module 14 (named Extension Scaheduler) and a calculation control module 13 (computility controller). The second transmission module 12 is configured to interact and synchronize with the edge node 2, in particular based on websocket.

The scheduling module 14 is configured to allocate the target edge node and the computing power resource according to the service request. The scheduling module 14 implements an extension of the Kubernetes native scheduler based on the Scheduler Framework mechanism of Kubernetes.

The computing power control module 13 is used for quantifying computing power resources, for issuing daemon mirror addresses, injecting a vehicle to a container group (Pod), and issuing configuration information of the container group to a target edge node; the unit quantization is used for the calculation power resource, and the quantized calculation power resource is defined by CRD. Specifically, based on a Littwatch mechanism, after monitoring that an application needs to be deployed to an edge node, modifying a Docker mirror address of the application into a Docker mirror address adapting to the computing resource according to the type of the distributed computing resource; on the other hand, after the Sidecar container is injected into the Pod, the configuration information of the whole application Pod is sent to an edge server for starting. The computing force control module 13 is an Operator of Kubernetes.

Wherein the system is realized by depending on Kubernetes. On one hand, the unified management of a plurality of scattered computing resources and computing heterogeneous computing resources on the edge side can be realized; on the other hand, when the user creates the application, the user only needs to specify the calculation power and the calculation power memory required by the application, and the system can automatically match the optimal calculation power resource and node to run the application without concerning the type and the position of the specific calculation resource of the bottom layer, so that intelligent scheduling is realized.

In a specific test, the application rate of the edge computing power resource is improved by 20%, and the enterprise computing power cost is reduced by more than 15%; through fine granularity sharing and isolation of the computing power, the comprehensive utilization rate of computing power resources is improved by more than 3 times, and the stability of application is improved by more than 30 percent; meanwhile, the use convenience of the edge side calculation efficiency is improved, and the development difficulty of application is reduced. Through an autonomous mechanism, the fact that the application can continue to normally run after the server of the edge node is disconnected with the cloud network is guaranteed.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for scheduling computing resources, comprising the steps of:

responding to a computing power resource allocation instruction of a cloud, and establishing an application container in a target edge node, wherein the resource allocation instruction comprises logic resources of the distributed computing power resources and application container deployment information;

and obtaining the physical resources corresponding to the allocated logical resources through the side car of the application container, and mounting the driving files of the physical resources into the application container.

2. The method of claim 1, wherein the management method of the computing power resource comprises:

obtaining physical resources of an edge node;

establishing logic resources and mapping of the physical resources and the logic resources according to the physical resources;

uploading the logic resource to a cloud;

and carrying out unit quantization on the computing power resources, wherein the computing power resources comprise computing power and memory.

3. The method of claim 1, wherein the method of obtaining the computing force resource allocation instruction comprises:

and obtaining application container deployment information according to the type of the computing power resource, wherein the application container deployment information comprises a container mirror image or a daemon mirror image which is suitable for the computing power resource.

4. A method according to claim 3, wherein the method of allocating a target edge node comprises:

obtaining the computing power resource requirement of the computing task;

obtaining an edge node list meeting the computing power resource requirement;

and selecting a target edge node from the edge node list based on the computing power resource residual quantity and/or the network condition.

5. A method according to claim 3, wherein the method of pulling up the container or group of containers comprises:

injecting a sidecar for the application container or group of containers;

starting the daemon according to the mirror image of the daemon;

the application container or group of containers is pulled by the daemon.

6. The method of claim 1, further comprising a method of edge node autonomy;

7. The method of claim 1, wherein the method of multi-application parallel scheduling comprises:

the first application container executes corresponding calculation tasks by using corresponding physical resource slices of the first logic resources, and the second application container executes corresponding calculation tasks by using corresponding physical resource slices of the second logic resources;

8. The cloud-based computing power resource scheduling method is characterized by comprising the following steps of:

according to the logic resources of the allocated computing resources and the deployment information of the application container, sending computing resource allocation instructions to the target edge node;

and sending the following mounting instructions to the target edge node: obtaining corresponding physical resources of the allocated logical resources through a side car of an application container, and mounting driving files of the physical resources into the application container;

9. A system for implementing the method of any one of claims 1-7, comprising a first transmission module and a container management module,

10. The system of claim 9, wherein the first transport module and container management module are deployed at a target edge node,

the second transmission module is configured to interact with an edge node,