CN113946431B

CN113946431B - Resource scheduling method, system, medium and computing device

Info

Publication number: CN113946431B
Application number: CN202111576730.2A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Real AI Technology Co Ltd
Current assignee: Beijing Real AI Technology Co Ltd
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2022-03-04
Anticipated expiration: 2041-12-22
Also published as: CN113946431A

Abstract

The application relates to the field of artificial intelligence, and discloses a resource scheduling method, a system, a medium and equipment, wherein the method comprises the following steps: acquiring request information of a first task to be executed, wherein the first task to be executed comprises at least one subtask, and the request information comprises resource information required to be consumed for executing each subtask; acquiring real-time available resources of a plurality of service nodes within a preset time length; determining a target node set based on real-time available resources of each service node in a preset time length and resource information required to be consumed by each subtask of a first task to be executed, wherein the target node set comprises a plurality of target service nodes of the subtasks to be executed in the preset time length, and each target service node corresponds to at least one subtask; and respectively sending a first message to each target service node, and indicating the target service node receiving the first message to execute at least one subtask within a preset time length. The resource scheduling method provided by the application can fully utilize the resources of each service node.

Description

Resource scheduling method, system, medium and computing device

Technical Field

The embodiment of the application relates to the field of artificial intelligence resource management, in particular to a resource scheduling method, a resource scheduling system, a resource scheduling medium and computing equipment.

Background

Some artificial intelligence models, such as a face recognition model, an image recognition model, and the like, often need to perform countermeasure evaluation on the models in a development and test stage, and often need to perform countermeasure test on the deep learning model by using thousands of face photos or images in an evaluation process, so that evaluation tasks of the artificial intelligence models, especially the deep learning model, may include hundreds of subtasks, which are often heavy and require a very large resource overhead.

Currently, most of the models are evaluated by using some open-source evaluation systems or countermeasure systems, a current mainstream manner is to execute each subtask in an evaluation task by scheduling each service node of a distributed system through k8s, when performing scheduling, k8s often determines whether to continue scheduling the service node to execute a new evaluation task according to a state of whether the service node is occupied by a certain task, in this case, there is easily a service node that is not fully loaded and only executes one evaluation task, and only after the evaluation task that is being executed is finished, the service node is released and then is scheduled to execute the new evaluation task, as shown in fig. 1, the distributed system includes 3 service nodes 1, 2, 3, if a user sends two tasks to be executed to a control module (including a transceiving unit and a scheduling unit) of the distributed system at a certain time, the first task to be executed comprises 7 subtasks, the second task to be executed comprises 8 subtasks, and therefore for the scheduling method of K8s, three service nodes can only execute three subtasks (subtasks 1-3) at the same time, and for the service nodes 1-3, a large amount of resource waste is obviously caused, while for massive tasks, the resources of each service node cannot be fully utilized, and it is difficult to reasonably allocate the service nodes to evaluate each task, so that untimely resource scheduling and unreasonable allocation are caused, resource waste is formed, and finally the evaluation efficiency is low; on the other hand, when the distributed system is scheduled based on k8s, all service nodes in the system are occupied by default, and all service nodes are occupied when the number of subtasks is too large, so that a large number of evaluation tasks are queued, and the processing efficiency is low.

Disclosure of Invention

The main purpose of the embodiments of the present application is to provide a resource scheduling method, system, medium, and computing device, which can perform timely and reasonable scheduling on limited hardware resources, and improve the processing efficiency of each request task of each client.

In order to achieve the above object, an embodiment of the present application provides a resource scheduling method, where the method is applied to a control module in a distributed system, where the distributed system further includes a plurality of service nodes, and the method includes:

receiving request information of a first task to be executed from a client, wherein the first task to be executed comprises at least one subtask, and the request information comprises resource occupation information when each subtask is executed;

acquiring real-time available resources of the plurality of service nodes within a preset time length;

determining a target node set based on real-time available resources of each service node within a preset time length and resource information required to be consumed by each subtask of a first task to be executed, wherein the target node set comprises a plurality of target service nodes of the subtasks to be executed within the preset time length, and each target service node corresponds to at least one subtask;

and respectively sending a first message to each target service node, wherein the first message is used for indicating the target service node receiving the first message to execute at least one subtask within the preset time length.

An embodiment of the present application further provides a resource scheduling system, including:

a transceiving unit, configured to receive request information of a first task to be executed from a client, where the first task to be executed includes at least one subtask, and the request information includes resource information required to be consumed for executing each subtask, and

the system comprises a plurality of service nodes and a server, wherein the service nodes are used for acquiring real-time available resources of the service nodes within a preset time length;

a scheduling unit configured to:

An embodiment of the present application also provides a medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the resource scheduling method.

An embodiment of the present application further provides a computing device, where the computing device includes a processor, and the processor is configured to implement the resource scheduling method when executing a computer program stored in a memory.

Compared with the prior art, in the technical scheme provided by the embodiment of the application, the service nodes which can meet the requirement of each subtask can be selected from the preset service nodes for use by selecting the current resource vacancy from the preset service nodes according to the hardware resource information required by each subtask in the task to be executed, so that the hardware resources of all service nodes in the system can be reasonably distributed, and the resource waste is avoided.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

Fig. 1 is an application scenario diagram of a resource scheduling method in a K8s distributed system;

fig. 2 is an application scenario diagram of a resource scheduling method according to an embodiment of the present application;

fig. 3 is a signaling interaction diagram of a resource scheduling method according to an embodiment of the present application;

fig. 4 is an interaction diagram of each end of a resource scheduling method according to an embodiment of the present application;

fig. 5 is a timing diagram of a resource scheduling method according to another embodiment of the present application;

FIG. 6 is a schematic diagram of an architecture of a distributed system in which a resource scheduling system is deployed according to an embodiment of the present application;

fig. 7 is a flowchart of a resource scheduling method according to an embodiment of the present application;

fig. 8 is a block diagram of a resource scheduling system according to an embodiment of the present application;

FIG. 9 is a block diagram of a computer-readable storage medium provided by an embodiment of the present application;

fig. 10 is a block diagram of a computing device according to an embodiment of the present application.

The implementation, functional features, and advantages of the various embodiments of the present application will be further described with reference to the accompanying drawings.

Detailed Description

The principles and spirit of embodiments of the present application will be described with reference to a number of exemplary embodiments. It should be understood that these embodiments are presented merely to enable those skilled in the art to better understand and to implement the embodiments of the present application, and are not intended to limit the scope of the embodiments of the present application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one of skill in the art, implementations of the embodiments of the present application may be embodied as an apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

The scheme provided by the embodiment of the present application relates to technologies such as Artificial Intelligence (AI), Natural Language Processing (NLP), Machine Learning (ML), and specifically is described by the following embodiments:

artificial intelligence (Ai) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The method for scheduling the resources is suitable for multiple tenants to share one system to complete different tasks, or the same tenant completes multiple tasks on the same system. The resource scheduling method of the embodiment of the application is suitable for a distributed system, the distributed system comprises a control module and service nodes, the service nodes can report own resource states to the control module periodically, or the control module can obtain the resource states of all the service nodes periodically, wherein a user sends a task to be executed to the distributed system through a client, the task to be executed can be evaluation requests of some artificial intelligence models uploaded by the user, or test-run tasks of some artificial intelligence models, such as a face recognition model, an image recognition model, a voice recognition model and the like, the user can upload model parameters of the deep learning models to a system for executing the evaluation task, the model parameters comprise hardware resources required to be consumed by the uploaded algorithm model when the sub-task is performed once, such as the size of a Graphics Processing Unit (GPU) required to be consumed, the size of the GPU, the size of the service node required to be consumed, the service node required to be consumed by the sub-task to be performed after the evaluation task is performed once, and the evaluation task is performed by the service node required to be performed by the service node, Memory size, CPU size, etc. In some embodiments, the resource scheduling method according to the embodiments of the present application may be implemented based on a distributed system. For example, the resource scheduling method according to the embodiment of the present application may be implemented based on a distributed system shown in fig. 2, where the distributed system includes at least one client, at least one control module, and a plurality of service nodes.

Each service node is deployed with at least one subtask, and the service node is used for providing hardware resources required by the execution of the subtask, such as a video card, a memory, a CPU, and the like. As shown in fig. 2, a user may send a task to be executed through a client, where the task to be executed includes multiple subtasks, such as an evaluation task of a face recognition model, and multiple face photos need to be subjected to recognition testing, and the recognition testing task for each face photo belongs to one subtask. Each service node can be preset, a plurality of preset service nodes can be provided, the plurality of service nodes are all used for processing each task to be executed of each user, the plurality of service nodes can provide hardware resources of different types and sizes, and a large number of hardware resources can be deployed for the hardware resources with high use frequency. For example, some service nodes may provide memory and CPU resources, and may also provide graphics card (GPU) resources, and may be deployed according to different resource priorities when each service node is deployed, for example, for a service node that only provides CPU and memory, a part of service nodes may provide larger memory and conventional CPU, a part of service nodes may provide larger CPU and conventional memory, and certainly, a part of service nodes may also be configured to provide larger memory and CPU at the same time. Similarly, for service nodes that provide memory, CPU, and graphics card at the same time, the service nodes may also be deployed according to different priorities, and the size of the hardware resource provided by each service node may also be set according to the requirement, for example, some service nodes may provide 6G graphics card (GPU), other service nodes may provide 12G graphics card (GPU), some service nodes may provide 6-core CPU, some service nodes may provide 8-core CPU, and the like. Therefore, different artificial intelligence models can be accurately served by the service nodes with different resource emphasis. In addition, each service node may be deployed on a physical host or a virtual host or container.

And the control module is used for receiving the tasks to be executed sent by the client and scheduling each service node to execute each subtask in the tasks to be executed according to the real-time available resources of each service node. And the control module carries out comprehensive scheduling according to the current available resources of each service node, and the scheduling is used for different tasks to be executed of different clients or used for a plurality of tasks to be executed of the same client.

The client is used for a user to send a task to be executed to the control module, for example, the user can send an algorithm code of the artificial intelligence model to the control module in the distributed system through the computer terminal, and resource information required by running the artificial intelligence model.

The resource scheduling method in the embodiment of the present application is described below by taking 1 control module and 3 service nodes (for example, service nodes 1, 2, and 3 in fig. 2) as examples. And the service node is deployed with an artificial intelligence model evaluation task. The following examples are intended to illustrate the details.

As shown in fig. 2, fig. 2 is an application schematic diagram of an embodiment of the resource scheduling method, where a user uploads tasks to be executed on a client, and after the control module receives each task to be executed, each service node is scheduled to execute each task to be executed according to the resource scheduling method of the embodiment of the application, and as shown in fig. 2, the client is configured to send the tasks to be executed, including an artificial intelligence model in which the tasks to be executed run, to the control module through a network, and to train or test data of the artificial intelligence model. The control module is used for receiving the tasks to be executed sent by the client through the network, responding to the received tasks to be executed sent by the client, and scheduling the service nodes capable of executing at least one subtask to execute the at least one subtask of the tasks to be executed according to the current resource vacancy of each service node. Each service node is used for receiving the scheduling information sent by the service module through the network, responding to the sub-tasks distributed by the control module and operating the artificial intelligence model by using the resources provided by the service node. Compared with the resource scheduling method of K8s shown in fig. 1, the resource scheduling method in the embodiment of the present application can make full use of the resources in each service node, and the scheduling manner shown in fig. 1 obviously has a lot of resource waste for the service node that is executing the task but has spare resources. In addition, the resource scheduling method of the embodiment of the application can serve a single client, and also can serve a plurality of different clients (i.e., an open source system), and whether a plurality of users are served or only a single user is served, the resource vacancy of each service node on the system can be obtained, so that in the face of a task to be executed, each subtask of the task to be executed can be reasonably allocated to each service node according to the resource overhead required by the task to be executed.

According to an embodiment of the application, a resource scheduling method, a resource scheduling system, a resource scheduling medium and a computing device are provided.

Exemplary method

Next, referring to fig. 3, the resource scheduling method is specifically described, and in the present exemplary embodiment, a resource scheduling method is provided, including the following steps:

step S100: request information of a first task to be executed is received from a client.

The first task to be executed comprises at least one subtask, and the request information comprises resource information required to be consumed by executing each subtask.

In step S100, request information of a first task to be executed is received from a client, where the request information of the first task to be executed may be sent by a user through the client, the same user may send request information of multiple tasks to be executed through the client, and after receiving the request information of each task to be executed of each user, the system may perform sorting according to the received time. The request information sent by the user to the server or the system includes the artificial intelligence model to be run and the resource information required to be consumed by each subtask running the artificial intelligence model, such as a face recognition model, which needs to recognize 1000 face photos, so that there are 1000 subtasks and how much CPU, GPU, memory, and the like, each subtask needs to consume.

After receiving each request message of each user, the system can perform sorting processing according to the request time of each request message.

Step S200: and acquiring real-time available resources of the plurality of service nodes within a preset time length.

After the current first task to be executed is determined, step S200 is performed to determine real-time available resources of all the service nodes within a preset time length. For example, a service node may provide a 4G GPU, and if only a 1G GPU is occupied at the current time, the service node may also currently provide a 3G GPU, that is, a GPU whose currently available resources are 3G.

Since the resources of the service node may change at a time, for example, the execution schedules of a plurality of tasks on the service node are different, some tasks are just started to be executed, and some tasks are soon to be executed, as shown in fig. 4, the service node may feed back available resources to the control module in real time through the network, so that the control module performs real-time scheduling according to the available resources, but such a scheduling manner inevitably depends on updating of the resource status, that is, if the resource status is not updated in time, scheduling may be untimely, and therefore, when the current available resources are obtained by calculation based on the resource status at each time, it may be inaccurate enough, and therefore, in other embodiments, the available resources in a future time period may be predicted according to a preset time period, and then, a task scheduling plan within the preset time period is scheduled for each service node, within the preset time, each service node is arranged to execute the task according to the task scheduling plan, and compared with the method of calculating available resources at every moment to schedule the service nodes, the method is more accurate and efficient.

Step S300: the method comprises the steps of determining a target node set based on real-time available resources of each service node in a preset time length and resource information required to be consumed by each subtask of a first task to be executed, wherein the target node set comprises a plurality of target service nodes of the subtasks to be executed in the preset time length, and each target service node corresponds to at least one subtask.

After the step S100 and the step S200, the consumed resource size required by each evaluation subtask of the artificial intelligence model uploaded by the user and the resource size that can be provided by each current service node can be obtained, so that in this step, it can be known which service nodes can satisfy each subtask of the artificial intelligence model uploaded by the user, that is, the target service nodes that can execute each subtask of the first task to be executed can be determined, and each target service node constitutes the target node set.

In the embodiment, a plurality of service nodes which can meet the current available resources are selected from a plurality of preset service nodes for use according to the resource information required by the artificial intelligence model of the user, so that the hardware resources of each service node in the system can be reasonably distributed, and the resource waste is avoided.

For example, a user needs to evaluate a certain face recognition model, the user uploads an algorithm model and a code of the face recognition model through a client and hardware resources required to be occupied each time the user runs, it is assumed that the face recognition model mainly occupies Graphics Processing Unit (GPU) resources, the graphics processing unit resources required to be occupied each time the user runs are 2G, that is, each subtask requires 2G graphics processing unit resources, and then after receiving the task to be executed, the server needs to select a service node capable of providing 2G graphics processing unit resources as a target service node. For example, the system is provided with 10 service nodes, and monitors that the 10 service nodes can provide video card resources of 2G or more at present, then the 10 service nodes can be scheduled to respectively execute at least one subtask of the face recognition model, and each service node specifically executes several subtasks according to the available resources of each service node, for example: if a certain service node only has the vacant 2G video card resources, only 1 subtask can be executed at the same time. It should be noted that different artificial intelligence models mainly consume different resources, for example, the face recognition model in the above embodiment mainly consumes video card resources, and has low requirements on the memory and the CPU, so the above embodiment only describes the video card resources, and it is a precondition that the memory and the CPU resources also meet the operation requirements of each subtask of the face recognition model.

First messages may then be sent to the target serving nodes to instruct the target serving nodes to perform respective subtasks of the first task to be performed. Namely, it is

Step S400: and respectively sending a first message to each target service node, wherein the first message is used for indicating the target service node receiving the first message to execute at least one subtask. In step 400, a first message may be sent to the target service nodes to instruct the target service nodes to perform the respective subtasks of the first task to be performed.

In this embodiment, as shown in fig. 5, the control module can schedule a service node that can satisfy at least one sub-task to execute a corresponding sub-task uploaded by the client based on the real-time available resources of each service node 1-n, but for a service node that has a resource vacancy but is not enough to execute one sub-task, there is a certain waste of resources, so in yet another embodiment, a service node that has a resource vacancy but is not enough to execute one sub-task may also be scheduled.

For example, in another embodiment of the present embodiment, after determining the target node set, the resource scheduling method further includes: if the subtask of the undetermined target service node exists, a plurality of unsaturated service nodes which are not determined as the target service nodes exist, and the sum of available resources of the plurality of unsaturated service nodes meets the requirement of executing the subtask of at least one of the undetermined target service nodes, sending a second message to the plurality of unsaturated service nodes, wherein the second message is used for indicating the plurality of unsaturated service nodes which receive the second message to cooperatively execute the subtask of at least one of the undetermined target service nodes.

Assuming that A, B, C, D, E, F, G, H, I, J10 service nodes are preset in the system in this embodiment, and the memories and CPUs provided by the 10 service nodes all satisfy the requirement for operating the face recognition model, as shown in table 1 below, for the graphics card resources provided by each service node, and the current graphics card resources of each service node are vacant, the face recognition model needs 2G of graphics cards for each operation, that is, each subtask needs 2G of graphics card resources, after step S300, it may be determined that the target node set includes E, F, G, H, I, J service nodes, and thus in step S400, a first message is sent to the service node E, F, G, H, I, J to indicate that each subtask of the face recognition is performed. According to the current display card resource vacancy of each service node, the service node E, F can run one subtask at a time, and the service node G, H, I, J can run 2 subtasks at a time. Table 2 shows that, when the target service node E, F, G, H, I, J is scheduled to execute real-time available resources of a video card after the sub-tasks of the face recognition model, it can be seen that the service node A, B, C, D belongs to 4 unsaturated service nodes that are not determined as target service nodes, and the service nodes A, B, C, D all have video card resources empty but are not enough to execute one sub-task of the face recognition model alone, at this time, if the sub-tasks of undetermined target service nodes, that is, the sub-tasks of service nodes are not allocated, it can be determined whether the sum of available resources of the unsaturated service node A, B, C, D satisfies one sub-task, one sub-task of the face recognition model only needs 2G video card resources, and the unsaturated service node A, B, C, D can provide 4G video card resources in total, it is clear that 2 subtasks can be performed, then a second message can be sent to the service node A, B, C, D to indicate that the subtasks of the face recognition model are to be performed in coordination. It is easy to see that, in the embodiment, the resource vacancy of the unsaturated service node, where the available resource is not enough to execute one subtask, can also be fully utilized.

TABLE 1

Service node

A

B

C

D

E

F

G

H

I

J

Total amount of resources

4G

6G

Real-time resource vacancy

1G

2G

4G

TABLE 2

Service node

A

B

C

D

E

F

G

H

I

J

Total amount of resources

4G

6G

Real-time resource vacancy

1G

0

Considering that there are service nodes having resource vacancy but insufficient to complete one subtask alone, when the sum of resource vacancy of a plurality of service nodes is sufficient to complete one subtask, the plurality of service nodes may be combined to perform at least one subtask, but when there are a plurality of combinations of the plurality of service nodes, it is also necessary to consider how to perform an optimal combination. In another embodiment of this embodiment, before scheduling the plurality of unsaturated service nodes to cooperatively perform the subtasks of at least one of the undetermined target service nodes (i.e. before sending the second message to the plurality of unsaturated service nodes), the method further includes:

determining a plurality of candidate combining strategies of the plurality of unsaturated service nodes, wherein each candidate combining strategy can execute the subtask of at least one undetermined target service node;

determining a target combination strategy from the candidate combination strategies according to a first preset rule, and combining the unsaturated service nodes according to the determined target combination strategy;

and sending a third message to the combined plurality of unsaturated service nodes, where the third message is used to instruct the unsaturated service node group receiving the third message to cooperatively execute a subtask of at least one undetermined target service node. Specifically, the following are:

if the available resources of the unsaturated service node A, B, C, D exceed the amount of resources required by one subtask of the face recognition model, the unsaturated service node A, B, C, D may cooperatively execute at least one subtask, and at this time, it may further continue to determine whether the 4 unsaturated service nodes have multiple combination modes and can all execute one subtask, and obviously, A, B, C, D4 two-by-two combinations of the unsaturated service nodes can all satisfy one subtask of the face recognition model, that is, the following policy combination exists: a + B and C + D; a + C and B + D; a + D and B + C, at this time, a target combination policy may be determined from three candidate combination policies according to a first preset rule, and the plurality of unsaturated service nodes are combined according to the determined target combination policy, where the first preset rule may be according to a current load of each service node group, or a network communication quality, or a physical distance between a service node and a client, and the like, taking the network communication quality as an example, the current communication quality of each of the three combination manners may be respectively calculated, and A, B, C, D four service nodes are scheduled according to a combination policy with the highest communication quality to form two service node groups to respectively execute a sub-task, and assuming that the current combination communication quality of a + B and C + D is the highest, then a third message may be respectively sent to the a + B combination and the C + D combination, respectively, to indicate that the two combinations each perform a subtask. It should be understood that, in this embodiment, when there are multiple combination modes for unsaturated service nodes, an optimal combination mode may be selected according to a preset first rule, so as to increase the speed of task processing.

Considering that, when there is a single unsaturated service node which is not enough to execute a subtask, a subtask may be cooperatively completed by using multiple unsaturated service nodes, but if the sum of available resources of multiple unsaturated service nodes does not satisfy execution of at least one subtask, that is, although the current first task to be executed still has a subtask whose target service node is not determined, the available resources of the current unsaturated service nodes are all added up and are not enough to execute any subtask of the first task to be executed, it would obviously cause resource waste to some extent if the current resource vacancy states of the multiple unsaturated service nodes are kept unchanged. Therefore, in another embodiment of the present application, the problem of resource waste can be solved by executing another to-be-executed task, and specifically, if the sum of the available resources of the plurality of unsaturated service nodes does not satisfy the sub-task of executing at least one of the undetermined target service nodes, the embodiment of the present application further includes:

acquiring a second task to be executed from a task sequence to be executed, wherein the resource consumption of at least one subtask of the second task to be executed is matched with the sum of available resources of the plurality of unsaturated service nodes, and sending a fourth message to the plurality of unsaturated service nodes, wherein the fourth message is used for indicating the plurality of unsaturated service nodes receiving the fourth message to cooperatively execute the at least one subtask of the second task to be executed; or

And the resource consumption of at least one subtask of the second task to be executed is matched with the available resource of at least one unsaturated service node in the plurality of unsaturated service nodes, and a fifth message is sent to the at least one unsaturated service node, wherein the fifth message is used for indicating the at least one unsaturated service node receiving the fifth message to execute the at least one subtask of the second task to be executed.

For example, in this embodiment, the current real-time resource vacancy of each service node is shown in table 3, and taking the face recognition model as an example, it is known that the sum of the current graphics card resources of the current unsaturated service node A, B, C, D is 1G, that is, the combination of 4 unsaturated service nodes is not enough to execute one sub-task, at this time, another task to be executed may be obtained in the task sequence to be executed, and the resource consumption of at least one sub-task to be executed is less than or equal to 1G, that is, the second task to be executed. It should be noted that the second task to be executed described herein is not necessarily the second task to be executed in the time sequence, and if the resource consumption of all the sub tasks of the second task to be executed in the time sequence is greater than 1G, the other tasks to be executed are continuously acquired in the queuing sequence until at least one task to be executed with the resource consumption less than 1G is acquired. Then a fourth message may be sent A, B, C, D to the service node indicating that a subtask of the second pending task is to be performed in conjunction. Or sending A, B, C, D a fifth message to a service node with more resources than the second task to be executed to indicate that the sub task is executed independently. It is easy to see that, with the resource scheduling method in this embodiment, for the case that the sum of the available resources of the unsaturated service nodes is not enough to execute one subtask of the first to-be-executed task, the unsaturated service nodes may also be scheduled according to the resource consumption of the subtasks of other to-be-executed tasks, so that the available resources of each service node can be scheduled with finer granularity.

TABLE 3

Service node

A

B

C

D

E

F

G

H

I

J

Total amount of resources

4G

6G

Real-time resource vacancy

0.25G

0

The resource scheduling method in the above embodiment can fully schedule the resource of each service node, so that the resource utilization rate of the whole system reaches the highest.

As shown in fig. 6, in another embodiment of this embodiment, since all service nodes are pre-divided into a plurality of service node groups, and the plurality of service node groups can provide hardware resources of different types and sizes, in order to conveniently manage each service node, in this embodiment, each service node can also be pre-grouped, and the grouping can be performed according to the type of the hardware resource provided by each service node. For example, the plurality of service nodes may be divided into at least the following 3 types of service node groups: GPU resource group, CPU resource group and memory resource group. The number of service node groups of the same type, and the number of service nodes in each group are not limited.

The GPU resource groups are divided according to GPU resources, the weight of graphics card (GPU) resources provided by each service node in the GPU resource groups is higher, and for example, each service node can provide 1G memory, 2-core CPU and 8G graphics cards; the CPU resource group is divided according to the CPU resources, the weight of the CPU resources provided by each service node in the CPU resource group is higher, for example, each service node can provide 1G memory, 1G display card and 8-core CPU; the memory resource group is divided according to memory resources, and the weight of the memory provided by each service node in the memory resource group is higher, for example, each service node can provide a 1G display card, a 2-core CPU and an 8G memory. In addition, a plurality of service node groups of each type can be provided, and the size of each service node group can be set according to needs. For example, when the system is initially deployed, 10 GPU resource groups, 5 memory resource groups, and 8 CPU resource point groups may be divided. In addition, the 10 graphics card groups may have sizes of 12G, 8Gg, 6G, and the like, and the memory and CPU resource group may also have different sizes. Each type of service node group may be completed in the registry at the time of initial deployment, for example, label the label of the service node that emphasizes providing the same resource as the same. Compared with the above embodiments, in this embodiment, by setting a plurality of service node groups, a plurality of service node groups can be scheduled in parallel to be used by a plurality of tasks to be executed at the same time, so that on one hand, the tasks can be processed at the same time, and each service node group has a plurality of service nodes, and the processing efficiency of each task to be executed can be ensured to a certain extent; on the other hand, different service node groups can provide different resource types with emphasis, so that when facing different types of artificial intelligence models, the artificial intelligence models consuming different resources with emphasis can be isolated from each other, and the running stability of each artificial intelligence model is improved.

In this embodiment, after the service node groups are pre-divided, and after the request information of the first task to be executed is received, the method further includes:

acquiring real-time available resources of all service node groups within a preset time length;

determining each target service node group capable of executing the first task to be executed based on real-time available resources of each service node group in a preset time length and resource information required to be consumed by each subtask of the first task to be executed;

determining a target service node group from each target service node group capable of executing the first task to be executed according to a preset second rule;

and sending a sixth message to the determined target service node group, wherein the sixth message is used for indicating the target service node group receiving the sixth message to execute the first task to be executed.

Specifically, assuming that 20 service node groups are preset in this embodiment, where 10 display card resource groups, 5 memory resource groups, and 5 CPU resource groups are preset, and each of the 20 service node groups includes 10 service nodes, when a certain client has multiple tasks to be executed in the same time period, or when multiple tasks to be executed by multiple clients exist in the same time period, multiple service node groups can be scheduled in parallel from among the 20 service node groups for the use of the tasks to be executed according to the types and sizes of the hardware resources required by the tasks to be executed.

After acquiring request information of a first task to be executed, firstly determining real-time available resources of all service node groups within a preset time length; then, determining a target service node group capable of executing each subtask of the first task to be executed based on the current available resource of each service node group and the resource information required to be consumed by each subtask of the first task to be executed; and finally, scheduling a service node group from each target service node group capable of executing the first task to be executed according to a preset second rule to execute the first task to be executed.

Taking the above 20 service node groups as an example, assuming that the task to be executed is an evaluation task for a face recognition model occupying 1G graphics cards, the hardware resources mainly consumed by the task are graphics cards, and the size of the hardware resources is 1G, then a service node group whose current resource vacancy is greater than or equal to 1G, that is, a target service node group, may be screened from 10 graphics card resource groups, where there are multiple service node groups whose current resource vacancy is greater than or equal to 1G, and at this time, according to a preset second rule, for example, one with the lowest load is selected, or one with the highest communication quality is selected, or one of the service node groups whose load is lower than a preset threshold value is selected.

Considering that each service node group includes a plurality of service nodes, it is necessary to determine how to select a service node after the service node group is selected. In this embodiment, after selecting one service node group for executing the first task to be executed, it is further required to determine available resources of each service node in the selected service node group, and then determine whether the available resources of each service node are sufficient to satisfy each subtask based on the available resources of each service node and resource information that needs to be consumed by each subtask of the first task to be executed. In this embodiment, each service node group includes a plurality of service nodes providing similar resources, and when scheduling a target service node group to execute the first task to be executed, the method includes:

if available resources of a single service node in the target service node group meet at least one subtask of the first task to be executed, sending a seventh message to the single service node, where the seventh message is used to instruct the single service node receiving the seventh message to execute the at least one subtask of the first task to be executed;

if the available resources of a single service node in the target service node group do not satisfy the requirement for executing at least one subtask of the first task to be executed, an eighth message is sent to the multiple service nodes, where the eighth message is used to indicate that the multiple service nodes receiving the eighth message cooperatively execute one subtask of the first task to be executed. For example, in this embodiment, if there is a single service node in the selected service node group that has available resources to satisfy the execution of at least one sub-task, a seventh message may be sent to the service node to instruct the service node to individually execute a sub-task of the first task to be executed.

For another example, in another embodiment, it may be that the currently available resources of the selected service node group are sufficient for executing a sub-task, but a single service node is not sufficient for executing a sub-task, and then an eighth message may be sent to multiple service nodes in the service node group to instruct the multiple service nodes to cooperatively execute a sub-task.

For another example, in another embodiment, after the target service node group is determined, if the currently available resources of the plurality of service nodes in the service node group all satisfy the requirement of executing one sub-task independently, at this time, a seventh message may be sent to the plurality of service nodes to schedule the plurality of service nodes in parallel to execute one sub-task respectively.

For another example, in another embodiment, after the target service node group is determined, some available resources of a single service node satisfy execution of a sub-task, and the sum of the available resources of some service nodes satisfy execution of a sub-task, then the service nodes that satisfy execution of a sub-task by a single available resource may be scheduled to respectively execute at least one sub-task, and after the combination is scheduled, the service node groups that satisfy execution of a sub-task cooperatively execute a sub-task.

In another embodiment of this embodiment, the method further comprises: and monitoring the state information of all service nodes, and updating the state information of the service node group where each service node is located according to the state information of each service node. For example, in an embodiment, the selected target service node group has 10 service nodes in total, each service node provides 2G graphics card resources, and the resource vacancy is 2G, if the service node group is used to execute the subtasks of the face recognition model, and each service node can execute exactly one subtask, the resource vacancy of the service node group is 0, because the face recognition model has a plurality of subtasks and the completion time of each subtask cannot be consistent, a part of service nodes may be completed first and a part of service nodes are completed later in the last round of task processing. Or, if the number of all the subtasks is not a multiple of 10, then some subtasks may still be processed by some service nodes in the last round, and some service nodes do not have subtasks to be processed, and at this time, the available resources of the service node group need to be updated in time according to the real-time resource vacancy status of each service node, so as to be conveniently used by other tasks to be executed.

Fig. 7 is a flowchart of a resource scheduling method according to an embodiment of the present application, and as shown in fig. 7, performing available monitoring on all service node groups, continuing monitoring if a service node group is currently unavailable (that is, no resource is currently available), acquiring status information of each service node of a service node group if a certain service node group is currently available (if any service node in a certain service node group has a resource that is currently available, the service node group is currently available), determining the size of the currently available service node and the provided hardware resource, updating the size in the status information of the service node group, scheduling the service node from the status information of each service node group according to the method in each embodiment when responding to a task to be executed by any client, constructing a mirror image after selecting the service node, and generating a yaml file, and providing services for each subtask of the task to be executed, deleting the mirror image constructed by each service node after the task to be executed is completed, and releasing each service node for the next task to be executed. It should be noted that, although fig. 7 indicates a CPU resource group, a GPU resource group, and a memory resource group, it only means that the service node group emphasizes providing hardware resources of the type, for example, for the memory resource group, it means that memory resources can be emphasized, and a video card and a CPU can be configured in a conventional size, and it does not mean that the service node group cannot provide the CPU or the video card resources, and all hardware resources required for executing an artificial intelligence model are configured for each resource group.

The resource scheduling method provided by the embodiment of the application can fully utilize the resources provided by each service node, can simultaneously perform parallel scheduling on the level of the service node group and the level of each service node in each service node group, fully utilizes the resources of each service node, and has higher efficient processing speed when facing the user request with a large number of subtasks.

Exemplary System

Any technical feature mentioned in the embodiment corresponding to any one of fig. 2 to fig. 7 is also applicable to the embodiment corresponding to fig. 8 in the embodiment of the present application, and the subsequent similarities are not repeated. After introducing the resource scheduling method according to the exemplary embodiment of the present application, a detailed description will be given of the resource scheduling system according to the exemplary embodiment of the present application.

A resource scheduling system shown in fig. 8 can be used in a distributed system, where the resource scheduling system can implement the steps of the resource scheduling method performed in the embodiment corresponding to any one of fig. 2 to fig. 7, in this embodiment, the resource scheduling system 100 can be deployed in a control module in each embodiment of the resource scheduling method in the foregoing exemplary embodiment, and the resource scheduling system 100 includes:

a transceiving unit 101, configured to receive request information of a first task to be executed from a client, where the first task to be executed includes at least one sub-task, and the request information includes resource information required to be consumed for executing each sub-task, and

a scheduling unit 102 configured to:

In this embodiment, the scheduling unit 102 is further configured to: after the target node set is determined, if a subtask of an undetermined target service node exists, a plurality of unsaturated service nodes which are not determined as target service nodes exist, and the sum of available resources of the plurality of unsaturated service nodes meets the requirement of executing the subtask of at least one of the undetermined target service nodes, sending a second message to the plurality of unsaturated service nodes, wherein the second message is used for indicating the plurality of unsaturated service nodes which receive the second message to cooperatively execute the subtask of at least one of the undetermined target service nodes.

In this embodiment, the scheduling unit 102 is further configured to: prior to sending the second message to the plurality of undersaturated serving nodes,

and sending a third message to the combined plurality of unsaturated service nodes, where the third message is used to instruct the unsaturated service node group receiving the third message to cooperatively execute a subtask of at least one undetermined target service node.

In this embodiment, the transceiver module 101 is further configured to: when the sum of the available resources of the plurality of unsaturated service nodes does not meet the requirement of executing the subtask of the at least one undetermined target service node, acquiring a second task to be executed from a task sequence to be executed, wherein the resource consumption of at least one subtask of the second task to be executed is matched with the sum of the available resources of the plurality of unsaturated service nodes, or the resource consumption of at least one subtask of the second task to be executed is matched with the available resources of at least one unsaturated service node in the plurality of unsaturated service nodes;

the scheduling unit 102 is further configured to: when the resource consumption of at least one subtask of the second task to be executed matches with the sum of available resources of the plurality of unsaturated service nodes, sending a fourth message to the plurality of unsaturated service nodes, where the fourth message is used to instruct the plurality of unsaturated service nodes receiving the fourth message to cooperatively execute at least one subtask of the second task to be executed;

when the resource consumption of at least one subtask of the second task to be executed matches with available resources of at least one unsaturated service node in the plurality of unsaturated service nodes, sending a fifth message to the at least one unsaturated service node, where the fifth message is used to instruct the at least one unsaturated service node receiving the fifth message to execute the at least one subtask of the second task to be executed.

In this embodiment, the plurality of service nodes belong to a plurality of service node groups, and the receiving unit 101 is further configured to obtain real-time available resources of all the service node groups within a preset time length;

the scheduling unit 102 is further configured to:

In this embodiment, each service node group includes a plurality of service nodes providing homogeneous resources, and the scheduling unit is further configured to: upon sending the sixth message to the determined target group of serving nodes,

when available resources of a single service node in the target service node group meet at least one subtask of the first task to be executed, sending a seventh message to the single service node, where the seventh message is used to instruct the single service node receiving the seventh message to execute the at least one subtask of the first task to be executed;

when the available resources of a single service node in the target service node group do not satisfy the execution of at least one subtask of the first task to be executed, an eighth message is sent to the multiple service nodes, where the eighth message is used to indicate that the multiple service nodes receiving the eighth message cooperatively execute one subtask of the first task to be executed.

The specific implementation methods of each module, unit, and node refer to each embodiment in the exemplary method, and are not described herein any more. The resource scheduling system provided by the embodiment of the application can select the service nodes which have the spare resources and can meet the requirements of each subtask from the preset service nodes for use according to the hardware resource information required by each subtask in the task to be executed, can reasonably use the hardware resources of all the service nodes, and avoids resource waste.

Exemplary Medium

Having described the methods and systems of the exemplary embodiments of the present application, a computer-readable storage medium of the exemplary embodiments of the present application is described with reference to fig. 9.

Referring to fig. 9, a computer-readable storage medium is shown as an optical disc 70, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program implements the steps described in the above method embodiments, such as: receiving request information of a first task to be executed from a client, wherein the first task to be executed comprises at least one subtask, and the request information comprises resource occupation information when each subtask is executed; acquiring real-time available resources of the plurality of service nodes within a preset time length; determining a target node set based on real-time available resources of each service node within a preset time length and resource information required to be consumed by each subtask of a first task to be executed, wherein the target node set comprises a plurality of target service nodes of the subtasks to be executed within the preset time length, and each target service node corresponds to at least one subtask; and respectively sending a first message to each target service node, wherein the first message is used for indicating the target service node receiving the first message to execute at least one subtask within the preset time length. The specific implementation of each step is not repeated here.

It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.

Exemplary computing device

Having described the methods, systems, and media of the exemplary embodiments of the present application, a computing device 80 of the exemplary embodiments of the present application is described next with reference to FIG. 10.

FIG. 10 illustrates a block diagram of an exemplary computing device 80, which computing device 80 may be a computer system or server, suitable for use in implementing embodiments of the present application. The computing device 80 shown in fig. 10 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the application.

As shown in fig. 10, components of computing device 80 may include, but are not limited to: one or more processors or processing units 801, a system memory 802, and a bus 803 that couples various system components including the system memory 802 and the processing unit 801.

Computing device 80 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computing device 80 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 802 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 8021 and/or cache memory 8022. Computing device 70 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, ROM8023 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 10, and typically referred to as a "hard disk drive"). Although not shown in FIG. 10, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 803 by one or more data media interfaces. At least one program product may be included in system memory 802 having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the application.

Program/utility 8025, having a set (at least one) of program modules 8024, can be stored, for example, in system memory 802, and such program modules 8024 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment. Program modules 8024 generally perform the functions and/or methods of embodiments described herein.

Computing device 80 may also communicate with one or more external devices 804 (e.g., keyboard, pointing device, display, etc.). Such communication may be through an input/output (I/O) interface. Moreover, computing device 80 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via network adapter 806. As shown in FIG. 10, the network adapter 806 communicates with other modules of the computing device 80, such as the processing unit 801, over the bus 803. It should be appreciated that although not shown in FIG. 10, other hardware and/or software modules may be used in conjunction with computing device 80.

The processing unit 801 executes various functional applications and data processing by executing programs stored in the system memory 802, for example, receives request information of a first task to be executed from a client, the first task to be executed including at least one subtask, the request information including resource occupation information when executing each of the subtasks; acquiring real-time available resources of the plurality of service nodes within a preset time length; determining a target node set based on real-time available resources of each service node within a preset time length and resource information required to be consumed by each subtask of a first task to be executed, wherein the target node set comprises a plurality of target service nodes of the subtasks to be executed within the preset time length, and each target service node corresponds to at least one subtask; and respectively sending a first message to each target service node, wherein the first message is used for indicating the target service node receiving the first message to execute at least one subtask within the preset time length. It should be noted that although in the above detailed description several units/modules or sub-units/sub-modules of the resource scheduling system are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to implementations of embodiments of the present application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Further, while the operations of the methods of the embodiments of the present application are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the operations shown must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the embodiments of the present application have been described with reference to several particular embodiments, it is to be understood that the embodiments of the present application are not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as it does not imply that features in these aspects cannot be combined to advantage. The embodiments of the application are intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

The above description is only an alternative embodiment of the present application, and not intended to limit the scope of the present application, and all modifications and equivalents of the embodiments of the present application, which are made by the contents of the specification and the drawings, or which are directly/indirectly applicable to other related technical fields, are included in the scope of the present application.

Claims

1. A resource scheduling method, the method being applied to a control module in a distributed system, the distributed system further comprising a plurality of service nodes, the method comprising:

respectively sending first messages to each target service node, wherein the first messages are used for indicating the target service node receiving the first messages to execute at least one subtask within the preset time length;

after determining the set of target nodes, the method further comprises:

if a subtask of an undetermined target service node exists, a plurality of unsaturated service nodes which are not determined as the target service node exist, and the sum of available resources of the plurality of unsaturated service nodes meets the condition of executing the subtask of at least one undetermined target service node, determining a plurality of candidate combination strategies of the plurality of unsaturated service nodes, wherein each candidate combination strategy can execute the subtask of at least one undetermined target service node;

and sending a second message to the combined plurality of unsaturated service nodes, wherein the second message is used for indicating the unsaturated service node group receiving the second message to cooperatively execute the subtasks of at least one undetermined target service node.

2. The method of claim 1, wherein if the sum of available resources of the plurality of unsaturated service nodes does not satisfy the sub-task of executing at least one of the undetermined target service nodes, the method further comprises:

3. The resource scheduling method according to claim 1, wherein the plurality of service nodes belong to a plurality of service node groups, and after obtaining the request information of the first task to be executed, the method further comprises:

4. The resource scheduling method of claim 3, wherein each service node group comprises a plurality of service nodes providing homogeneous resources, and when sending the sixth message to the determined target service node group, the method comprises:

if the available resources of a single service node in the target service node group do not satisfy the requirement for executing at least one subtask of the first task to be executed, an eighth message is sent to the multiple service nodes, where the eighth message is used to indicate that the multiple service nodes receiving the eighth message cooperatively execute one subtask of the first task to be executed.

5. The resource scheduling method according to any of claims 1-4, wherein the resources comprise at least one of: GPU, CPU and memory.

6. A resource scheduling system, comprising:

a receiving and sending unit, configured to receive request information of a first task to be executed from a client, where the first task to be executed includes at least one subtask, and the request information includes resource information required to be consumed for executing each subtask, and

the scheduling unit is used for determining a target node set based on real-time available resources of each service node within a preset time length and resource information required to be consumed by each subtask of a first task to be executed, wherein the target node set comprises a plurality of target service nodes of the subtask to be executed within the preset time length, and each target service node corresponds to at least one subtask;

after determining the target node set, if there are subtasks of undetermined target service nodes, there are multiple unsaturated service nodes that are not determined as target service nodes, and the sum of available resources of the multiple unsaturated service nodes satisfies the requirement of executing a subtask of at least one of the undetermined target service nodes, the scheduling unit is further configured to: sending a second message to the plurality of unsaturated service nodes, wherein the second message is used for indicating the plurality of unsaturated service nodes receiving the second message to cooperatively execute the subtasks of at least one undetermined target service node;

wherein, when there are multiple candidate combination policies for the multiple unsaturated service nodes, each candidate combination policy being capable of executing a subtask of at least one of the undetermined target service nodes, the scheduling unit is further configured to: and determining a target combination strategy from the candidate combination strategies according to a first preset rule, combining the unsaturated service nodes according to the determined target combination strategy, and taking the combined unsaturated service nodes as nodes for receiving the second message.

7. A medium on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.

8. A computing device, characterized in that the computing device comprises a processor for implementing the method according to any of claims 1-5 when executing a computer program stored in a memory.