CN113391886A - Task scheduling method and device - Google Patents
Task scheduling method and device Download PDFInfo
- Publication number
- CN113391886A CN113391886A CN202010165543.4A CN202010165543A CN113391886A CN 113391886 A CN113391886 A CN 113391886A CN 202010165543 A CN202010165543 A CN 202010165543A CN 113391886 A CN113391886 A CN 113391886A
- Authority
- CN
- China
- Prior art keywords
- task
- subtasks
- nodes
- group
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 99
- 238000004891 communication Methods 0.000 claims description 28
- 238000004590 computer program Methods 0.000 claims description 10
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 50
- 238000013468 resource allocation Methods 0.000 description 17
- 230000003111 delayed effect Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 6
- 230000001419 dependent effect Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013467 fragmentation Methods 0.000 description 3
- 238000006062 fragmentation reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 235000003642 hunger Nutrition 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000037351 starvation Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
Embodiments of the present specification provide a task scheduling method and apparatus, where a task type of a task is determined, and target nodes are allocated to multiple subtasks included in the task according to the task type of the task, so that different target nodes can be allocated to tasks of different task types, and the allocated target nodes are adapted to the task type, which is beneficial to improving at least one of task operation efficiency and cluster resource utilization rate.
Description
Technical Field
The present disclosure relates to the field of distributed system technologies, and in particular, to a task scheduling method and apparatus.
Background
At present, more and more tasks are beginning to be computed using distributed systems, wherein many tasks often include multiple subtasks, and the content and required resources of each subtask may be identical or different. In a task scheduling manner in a conventional distributed system, node allocation is generally performed on each subtask of a task independently, and at least one of resource utilization rate of a cluster and execution efficiency of the task is low.
Disclosure of Invention
The present disclosure provides a task scheduling scheme.
According to a first aspect of the embodiments of the present disclosure, there is provided a task scheduling method, including: determining the task type of the task; and distributing target nodes for a plurality of subtasks included in the task according to the task type of the task.
In some embodiments, the task type of the task is compute intensive or communication intensive.
In some embodiments, the allocating target nodes to a plurality of subtasks included in the task according to the task type of the task includes: and determining a target node corresponding to each subtask in the plurality of subtasks according to the task type of the task and the resource demand information of the plurality of subtasks, and distributing the determined corresponding target node to the plurality of subtasks.
In some embodiments, the target nodes are sequentially determined for the plurality of subtasks based on the task type of the task, wherein after the target node of each subtask is determined, the current resource state information of the distributed system is updated based on the resource requirement information of the subtask.
In some embodiments, a preselected set of nodes for each of the plurality of subtasks is determined, and a target node for each of the plurality of subtasks is selected from the preselected set of nodes for each subtask based on a task type of the task.
In some embodiments, a preselected set of nodes for each of the plurality of subtasks is determined based on the resource requirement information for the plurality of subtasks and the task type of the task.
In some embodiments, the target node for each of the plurality of subtasks is selected from a preselected set of nodes for each of the subtasks in turn in an order.
Optionally, the order is determined based on at least one of a priority of the subtasks, a dependency between the subtasks.
In some embodiments, the determining a preselected set of nodes for each of the plurality of subtasks includes: a set of preselected nodes is determined for each of the plurality of subtasks in a particular order.
In an alternative example, the preselected node for the current subtask is selected from a preselected set of nodes for one or more previous subtasks for the current subtask.
In some embodiments, a score of a preselected node included in the preselected set of nodes for each subtask is determined based on a task type of the task; and selecting the target node of each subtask from the preselected node set of each subtask based on the scores of the preselected nodes contained in the preselected node set of each subtask.
In some embodiments, the allocating target nodes to a plurality of subtasks included in the task according to the task type of the task includes: dividing a plurality of subtasks of the task into at least one group according to the task type of the task, wherein each group comprises at least one subtask in the plurality of subtasks; and respectively allocating a corresponding target node to each group in at least one group of the tasks, wherein each subtask in the same group is allocated to the same target node.
In some embodiments, the dividing the plurality of subtasks of the task into at least one group according to the task type of the task includes: and under the condition that the task type of the task is communication intensive, determining the total number of nodes required by a plurality of subtasks of the task according to the resource demand information of the plurality of subtasks, and dividing the plurality of subtasks of the task into at least one group according to the total number of the nodes.
In some embodiments, the dividing the plurality of subtasks of the task into at least one group according to the task type of the task includes: in a case where the task type of the task is computationally intensive, each of a plurality of subtasks of the task is treated as a group.
In some embodiments, said assigning a corresponding target node to each of at least one packet of said task, respectively, comprises: determining a preselected set of nodes for each of the at least one group based on current resource state information for the distributed system and resource demand information for each subtask in each group; and selecting the target node corresponding to each group from the preselected node set of each group.
In some embodiments, said determining a preselected set of nodes for each of said at least one group based on current resource state information for said distributed system and resource demand information for individual subtasks in each group comprises: determining a preselected set of nodes for each group based on current resource state information for the distributed system and resource demand information for the group; the selecting the target node corresponding to each group from the preselected node set of each group comprises: the target node for each group is determined from the preselected set of nodes for each group in turn in a particular order.
In some embodiments, said selecting a target node corresponding to each group from a set of preselected nodes for said each group comprises: determining a score for at least one preselected node contained in the set of preselected nodes for each group based on the task type for the task; selecting a target node for each group from the set of preselected nodes for the group based on a score of at least one preselected node contained in the set of preselected nodes for the group.
Wherein the set of preselected nodes for different groups may be the same or different. In some examples, multiple packets may have the same set of preselected nodes.
In some embodiments, the allocating target nodes to a plurality of subtasks included in the task according to the task type of the task includes: determining target nodes for a plurality of subtasks of the task according to the task type of the task; and under the condition that the corresponding target nodes are successfully determined for the plurality of subtasks of the task, distributing the target nodes for the plurality of subtasks included in the task.
In some embodiments, the allocating target nodes to a plurality of subtasks included in the task according to the task type of the task includes: determining target nodes for a plurality of subtasks of the task according to the task type of the task; and under the condition that the corresponding target nodes are successfully determined for the plurality of subtasks of the task, distributing the target nodes for the plurality of subtasks included in the task.
In some embodiments, the allocating target nodes to a plurality of subtasks included in the task according to the task type of the task includes: determining a target node for the at least one packet according to the task type of the task; and under the condition that a corresponding target node is successfully determined for each group in at least one group of the task, distributing the target nodes for a plurality of subtasks included in the task.
In some embodiments, the method further comprises: and under the condition that the corresponding target node cannot be successfully determined for at least one subtask in the plurality of subtasks, performing delay distribution on the plurality of subtasks of the task.
In some embodiments, in the event that the corresponding target node is not successfully determined for the first packet of the at least one packet, the target node is delayed from being allocated for each of a plurality of subtasks included in the task.
In some embodiments, the allocating target nodes to a plurality of subtasks included in the task according to the task type of the task includes: and according to the task type of the task, distributing target nodes for part of subtasks in the plurality of subtasks in the current resource distribution process or the current scheduling period.
In some embodiments, the allocating target nodes to a plurality of subtasks included in the task according to the task type of the task includes: and respectively determining target nodes corresponding to the plurality of subtasks according to the task types of the tasks, and under the condition that the first subtask in the plurality of subtasks does not have the corresponding target node, allocating the corresponding target node to other subtasks of the plurality of subtasks, and performing delayed allocation on the first subtask.
In some embodiments, the method further comprises: and after target nodes are distributed to the plurality of subtasks included in the task according to the task type, synchronously scheduling the plurality of subtasks of the task.
In some embodiments, the method is applied to a task orchestration system.
According to a second aspect of the embodiments of the present disclosure, there is provided a task scheduling apparatus, the apparatus including: the determining module is used for determining the task type of the task; and the distribution module is used for distributing target nodes for a plurality of subtasks included in the task according to the task type of the task.
In some embodiments, the task type of the task is compute intensive or communication intensive.
In some embodiments, the assignment module is to: and determining a target node corresponding to each subtask in the plurality of subtasks according to the task type of the task and the resource demand information of the plurality of subtasks, and distributing the determined corresponding target node to the plurality of subtasks.
In some embodiments, the assignment module is to: and determining target nodes for the plurality of subtasks in sequence based on the task types of the tasks, wherein after the target node of each subtask is determined, current resource state information of the distributed system is updated based on the resource demand information of the subtask.
In some embodiments, the assignment module is to: determining a preselected set of nodes for each of the plurality of subtasks, and selecting a target node for each of the subtasks from the preselected set of nodes for each of the plurality of subtasks based on a task type of the task.
In some embodiments, a preselected set of nodes for each of the plurality of subtasks is determined based on the resource requirement information for the plurality of subtasks and the task type of the task.
In some embodiments, the target node for each of the plurality of subtasks is selected from a preselected set of nodes for each of the subtasks in turn in an order.
Optionally, the order is determined based on at least one of a priority of the subtasks, a dependency between the subtasks.
In some embodiments, the assignment module is to: and determining a preselected node set of the current subtask from preselected node sets of subtasks before the current subtask.
In some embodiments, the assignment module comprises: the first determining unit is used for determining scores of preselected nodes contained in a preselected node set of each subtask according to the task type of the task; and the selecting unit is used for selecting the target node of each subtask from the preselected node set of each subtask based on the score of the preselected node contained in the preselected node set of each subtask.
In some embodiments, the assignment module comprises: the grouping unit is used for dividing a plurality of subtasks of the task into at least one group according to the task type of the task, wherein each group comprises at least one subtask in the plurality of subtasks; and the first allocation unit is used for allocating a corresponding target node to each group in at least one group of the tasks respectively, wherein each subtask in the same group is allocated to the same target node.
In some embodiments, the grouping unit is to: under the condition that the task type of the task is communication intensive, determining the total number of nodes required by a plurality of subtasks of the task according to the resource demand information of the plurality of subtasks, and dividing the plurality of subtasks of the task into at least one group according to the total number of the nodes; and/or, in the event that the task type of the task is computationally intensive, treating each of a plurality of subtasks of the task as a group.
In some embodiments, the first distribution unit comprises: a determining subunit, configured to determine a preselected node set of each of the at least one group based on current resource state information of the distributed system and resource demand information of each subtask in each group; and the selecting subunit is used for selecting the target node corresponding to each group from the preselected node set of each group.
In some embodiments, the determining subunit is to: determining a preselected set of nodes for each group based on current resource state information for the distributed system and resource demand information for the group; the selection subunit is configured to: the target node for each group is determined from the preselected set of nodes for each group in turn in a particular order.
In some embodiments, the selection subunit is to: determining a score for at least one preselected node contained in the set of preselected nodes for each group based on the task type for the task; selecting a target node for each group from the set of preselected nodes for the group based on a score of at least one preselected node contained in the set of preselected nodes for the group.
Wherein the set of preselected nodes for different groups may be the same or different. In some examples, multiple packets may have the same set of preselected nodes.
In some embodiments, the assignment module comprises: the second determining unit is used for determining target nodes for a plurality of subtasks of the task according to the task type of the task; and the second distributing unit is used for distributing the target nodes to the plurality of subtasks included in the task under the condition that the corresponding target nodes are successfully determined for the plurality of subtasks of the task.
In some embodiments, the apparatus further comprises: and the delay distribution module is used for performing delay distribution on the plurality of subtasks of the task under the condition that the corresponding target node is not successfully determined for at least one subtask in the plurality of subtasks.
In some embodiments, the assignment module is to: and according to the task type of the task, distributing target nodes for part of subtasks in the plurality of subtasks in the current resource distribution process or the current scheduling period.
In some embodiments, the assignment module is to: and respectively determining target nodes corresponding to the plurality of subtasks according to the task types of the tasks, and under the condition that the first subtask in the plurality of subtasks does not have the corresponding target node, allocating the corresponding target node to other subtasks of the plurality of subtasks, and performing delayed allocation on the first subtask.
In some embodiments, the apparatus further comprises: and the scheduling module is used for synchronously scheduling the plurality of subtasks of the task after distributing target nodes to the plurality of subtasks included in the task according to the task type.
In some embodiments, the apparatus is applied to a task orchestration system.
According to a third aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of the first aspect or any of the embodiments of the first aspect.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of the first aspect or any of the embodiments of the first aspect when executing the program.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer program comprising computer readable instructions which, when executed by a computer device, cause the computer device to perform the first aspect or the method of any possible implementation of the first aspect.
According to the method and the device for distributing the target nodes to the plurality of subtasks, the task type of the task is determined, the target nodes are distributed to the plurality of subtasks included in the task according to the task type of the task, the target nodes can be distributed based on the task type, the target nodes more suitable for the task can be distributed to the plurality of subtasks of the task, and therefore at least one of task operation efficiency and cluster resource utilization rate is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a schematic diagram of a distributed system of an embodiment of the present disclosure.
Fig. 2 is a flowchart of a task scheduling method according to an embodiment of the disclosure.
Fig. 3 is a schematic diagram of a resource change situation in a node allocation process according to an embodiment of the present disclosure.
Fig. 4 is a schematic diagram of a conventional task scheduling process.
FIG. 5 is a schematic diagram of a task scheduling process of an embodiment of the present disclosure.
Fig. 6 is a schematic diagram of scheduling logic of an embodiment of the present disclosure.
Fig. 7 is a schematic structural diagram of a task scheduling apparatus according to an embodiment of the present disclosure.
Fig. 8 is a schematic structural diagram of a computer device for configuring an apparatus according to an embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
At present, more and more tasks are beginning to be handled using distributed systems, such tasks being referred to as distributed tasks. As shown in fig. 1, a distributed system of some embodiments includes one or more clusters, each cluster includes one or more servers, each server can be regarded as a node (shown by black dots in the figure), and each node includes at least one of a CPU (Central Processing Unit), a GPU, a memory, a disk, a network port, and other resources. After a subtask is assigned to a node, the subtask may be executed by a resource on the node, and each node may execute one or more subtasks. The subtasks executed by the nodes in the same cluster may be the same or different.
When allocating resources for distributed tasks, different allocation manners may have an impact on the execution efficiency of the tasks. For example, in some cases, frequent communication is required between multiple subtasks in a task, and therefore, the multiple subtasks in the task are suitable to be distributed to nodes with smaller communication cost; in other cases, a plurality of subtasks in a task generate a relatively large amount of computation during execution, and therefore, the plurality of subtasks in the task are suitable for being distributed to nodes with more computation resources.
There are currently some schedulers (e.g., kube-batch) that batch schedule tasks in a distributed system according to the following: the scheduler tries to schedule each subtask in a task one by one in order, and judges whether the whole task meets the condition that the user expects to be able to run after the scheduling attempt of each subtask of the task is completed. This scheduling strategy is called gang scheduling.
However, in the current gang scheduling implementation, scheduling is performed by using subtasks as granularity, and a scheduler does not consider the characteristics of the whole task and does not look at the whole task globally, so that nodes allocated to the subtasks may not be the best choice, and at least one of task operation efficiency and resource utilization rate is low.
Based on this, the embodiment of the present disclosure provides a task scheduling method. As shown in fig. 2, the method may include:
step 201: determining the task type of the task;
step 202: and distributing target nodes for a plurality of subtasks included in the task according to the task type of the task.
The method in the embodiments of the present disclosure may be performed by any electronic device, for example, a terminal device or a cloud server, and in some embodiments, the method may be performed by a processor or a scheduler, where optionally, the processor or the scheduler may run on a cloud platform such as kubernets, and may be specifically provided on a container orchestration engine, but the embodiments of the present disclosure are not limited thereto.
In step 201, a task type of the task may be determined. The task type may be divided based on the actual situation. For deep learning tasks such as training and reasoning of deep learning models such as a neural network, optionally, the tasks may be divided into computation-intensive tasks and communication-intensive tasks, where the computation-intensive tasks refer to tasks with a relatively large computation amount in the task processing process, and the communication-intensive tasks refer to tasks with relatively frequent communication among subtasks in the task processing process. Alternatively, other manners of dividing task types may be performed, and the disclosed embodiments are not limited thereto.
In some embodiments, information for a task may be obtained and a task type for the task may be determined based on the information for the task, where the information for the task may be obtained based on user input or settings, or based on analysis of the task, and so on. In some embodiments, the information of the task may include information of a plurality of subtasks of the task, such as resource demand information, subtask priority information, dependency information between subtasks, and the like; alternatively, the information of the task may also include one or more of user-provided information, history information, calculation amount information, traffic information, and the like, which is not limited in this disclosure.
In some embodiments, the resource requirement information of the plurality of subtasks may include resource information required by the subtasks, where the resource required by the subtasks may include at least one of a computing resource and a communication resource, for example, the resource requirement information of the subtasks may include a resource type and/or a resource amount required by the subtasks, where the resource type may include, but is not limited to, at least one of a CPU, a GPU, a DSP, an FPGA, a memory, a disk, a communication port, and the like. The subtask priority information is used to determine priorities of a plurality of subtasks included in the task. And if the execution of the subtask B can be started only after the execution of the subtask A is finished, the subtask B depends on the subtask A, or the subtask A is a main task, and the subtask B is a slave task of the subtask A.
In some embodiments, in one scheduling cycle or resource allocation process, a plurality of tasks to be executed may be scheduled based on at least the priority of each task. The task priority information may include a priority of the task or information used to determine the priority of the task, for example, the task priority information includes, but is not limited to, the task latency and/or total resource information required by the task. The task waiting time refers to the time from the moment when the task is received by the scheduler to the moment when the task is scheduled, and a higher priority can be set for the task with longer waiting time, or a higher priority can be set for the task with longer waiting time and more required total resources, so that the starvation problem caused by the fact that one task is in a waiting and scheduling state for a long time can be avoided.
Besides the above information, the information of the task may also include other information according to the actual application scenario, which is not described herein again.
As an alternative example, the task type of a task may be determined based on the computational and/or traffic volume of a plurality of subtasks of the task. For example, in the case where the calculation amount is greater than a preset calculation amount threshold, the task is considered to be a calculation-intensive task; as another example, in the case where the traffic volume is greater than a preset traffic volume threshold, the task is considered to be a traffic intensive task. As another example, if both the calculated amount and the traffic amount exceed corresponding preset thresholds, it may be determined as a traffic intensive task.
As another optional example, the task type may also be determined according to historical information of the task, where the historical information may include historical scheduling information and/or historical execution conditions, and the like. Alternatively, the task type may be determined based on historical scheduling information for the task. For example, the task type of the task in the current scheduling is determined by referring to the task type determined in the historical scheduling of the task. For example, for a particular task, if the scheduler determines the task as a compute-intensive task during historical scheduling, the task is determined to be a compute-intensive task during the current scheduling; if the scheduler determines the task as a communication-intensive task during the historical scheduling process, the task is determined to be a communication-intensive task. Optionally, the task type of the task may also be determined based on at least one historical execution condition of the task, for example, if the computation of the task takes more time or computation resources in the last one or more historical execution processes, the task is determined as a computation-intensive task; for another example, if the communication of the task takes more time or communication resources during the last one or more historical executions, the task is determined to be a communication intensive task.
As another alternative example, the task type of the task may also be determined based on user-provided information, such as user-provided task type information, or user-provided computational and/or traffic information for the task, and so forth.
In the embodiment of the present disclosure, the task type of the task may also be determined based on other information, which is not limited in the embodiment of the present disclosure.
In step 202, the plurality of subtasks included in the task may be all subtasks in the task, or may be part of the task, for example, a subtask with a larger resource requirement amount, or a subtask with a higher priority, or a subtask with a dependency relationship therebetween, a subtask with a closer type or number of resource requirements, and so on. In some cases, the various subtasks in the same task may be performed by the same or different nodes in the same cluster in the distributed system. In the case that the distributed system includes a plurality of clusters, each subtask of the same task may be executed by a node of the same or different cluster, for example, the task may be first allocated to a certain cluster, and then a target node for the plurality of subtasks of the task may be determined from the cluster.
And based on the task type of the task, resource allocation is carried out on a plurality of subtasks included in the task, so that the task execution efficiency is improved. For example, for a communication-intensive task, a plurality of subtasks in the task may be allocated to a smaller number of target nodes as much as possible, so as to reduce communication overhead; for example, for a computation-intensive task, it is optionally only necessary that resources on a target node can meet requirements of a plurality of subtasks of the task, or a node with better computation performance is preferentially selected, or fragmented resources of the node are preferentially utilized, or communication overhead is optimized as much as possible on the premise that computation resources are guaranteed to meet user requirements, and the like.
In some embodiments, the resource allocation may be made relatively independently for multiple subtasks based on the task type of the task. For example, the allocation of resources may be done for multiple subtasks in a certain order or simultaneously.
In some embodiments, the sub-tasks are used as the objects of resource allocation, that is, the resource allocation can be performed for a plurality of sub-tasks respectively based on the task type of the task. For example, whether a target node meeting the requirement of each sub-task exists may be determined, and in a case that it is determined that each sub-task has a corresponding target node, the corresponding target node may be allocated to each sub-task, and in a case that it is determined that at least one first sub-task has a corresponding target node and at least one second sub-task does not have a corresponding target node, the corresponding target node may be allocated to the first sub-task, and the second sub-task may be allocated with a delay, that is, the second sub-task may be allocated with resources until a next scheduling period or a scheduling flow.
In other embodiments, a task may be used as an object of resource allocation, that is, resource allocation may be performed on a plurality of subtasks synchronously based on a task type of the task, where synchronization may refer to performing in the same resource allocation flow or cycle, and resource allocation on a plurality of subtasks synchronously may refer to performing resource allocation in parallel or sequentially in a certain order. In one example, the resource allocation results of the multiple subtasks are all allocated or all delayed allocations in one resource allocation process. For example, in a case where it is determined that at least one of the plurality of subtasks does not have a corresponding target node, it is determined to perform delay allocation on the plurality of subtasks of the task. In this way, it is beneficial to avoid that the task execution efficiency and the resource utilization rate are influenced by long-time starvation of some subtasks due to resource allocation of a plurality of subtasks relatively independently.
How to determine the target node corresponding to each of the plurality of subtasks based on the task type of the task will be described below.
In some embodiments, a target node corresponding to each of the plurality of subtasks may be determined according to the task type of the task and the resource requirement information of the plurality of subtasks, and the determined corresponding target node may be allocated to the plurality of subtasks. Target nodes capable of meeting the resource requirements of each subtask can be distributed, and the target nodes of different subtasks can be the same or different. The target node corresponding to each of the multiple subtasks may be sequentially determined according to a specific sequence, or the target node corresponding to each of the multiple subtasks may be simultaneously determined.
The above specific order may refer to a preset order. Alternatively, in some examples, the specific order may be determined according to priorities of the plurality of subtasks, for example, the corresponding target node is determined for a subtask with a higher priority, and then the corresponding target node is determined for a subtask with a lower priority, where the priority may be determined based on one or more factors such as a subtask type and a dependency relationship between the subtask and other subtasks, for example, for two subtasks with a dependency or a master-slave relationship, a priority of a master (master) subtask is higher than a priority of a slave (slave) subtask, but the embodiment of the disclosure is not limited thereto. Alternatively, in other examples, the particular order may be determined based on the dependencies or master-slave relationships of the subtasks. Because the slave subtasks cannot start to run under the condition of lacking the master subtask, the corresponding target nodes are preferentially allocated to the master subtask, and the running efficiency of the whole task is improved. Alternatively, the specific order may be determined based on other factors, which are not limited by the embodiments of the present disclosure.
In some embodiments, when determining the target node for a plurality of subtasks of the task, the target node corresponding to each of the plurality of subtasks may be determined through two processes, which are pre-selection and preferable.
In the preselection process, at least one preselection node capable of meeting the requirements of the subtasks in the distributed system is preliminarily selected for each subtask in the subtasks respectively based on the resource requirement information of the subtasks, or the nodes incapable of meeting the requirements of the distributed system are removed. For example, a preselected set of nodes for each of the plurality of subtasks may be determined based on current state information of the plurality of nodes in the distributed system and resource demand information for the plurality of subtasks, wherein the preselected set of nodes for each subtask includes at least one preselected node. The same or different nodes may be included in the preselected set of nodes for different subtasks.
In some embodiments, in the preselection process, a preselected set of nodes for a plurality of subtasks may be selected in turn in a particular order. In an alternative example, the preselected set of nodes for the following subtask is determined based on the preselected set of nodes for the preceding subtask. For example, the preselected node of the following subtask is selected from a preselected set of nodes of a preceding subtask, which may be a previous subtask to the current subtask or separated from the current subtask by at least one subtask. For example, the previous subtask is all subtasks that precede the current subtask. For another example, the preceding subtask includes one or more subtasks, but the disclosed embodiments are not limited thereto.
In some embodiments, the resource requirement information of the plurality of subtasks may also be considered comprehensively to determine the preselected node set of the plurality of subtasks, for example, a common preselected node set is selected for the plurality of subtasks, where the preselected node included in the common preselected node set can satisfy the resource requirements of at least two of the plurality of subtasks.
In a preferred process, the target node for each subtask is selected from a pre-selected set of nodes for each subtask. For example, a most suitable node may be selected from a pre-selected set of nodes for a subtask as a target node for the subtask. In some optional examples, the target node may be determined according to current state information of each preselected node included in the set of preselected nodes of the subtask, such as at least one of current load condition, available resource type, available resource quantity, available resource distribution and topological connection relationship between the inside of the node, and topological connection relationship with other nodes. The available resource types include, but are not limited to, at least any of: communication port, disk storage space, memory, CPU, GPU, etc. The available resource quantity can be CPU quantity, disk residual capacity, memory residual capacity and the like. The distribution of the available resources is also referred to as a resource fragmentation condition, i.e. a distribution position of the currently available resources, e.g. how many disks the remaining capacity of the disk is distributed on. The topology of each node in the distributed system may be, for example, a bus-type topology, a star-type topology, a ring topology, a tree topology, or the like.
In some embodiments, in the preselection process, the preselection node set of each subtask may be determined based on current resource state information of the distributed system and resource requirement information of each subtask, that is, the preselection process of each subtask in a plurality of subtasks of the task may not interfere with each other, and the preselection node set of each subtask is selected only in relation to the resource requirement information of the subtask and is not dependent on the resource requirement information of other subtasks. Accordingly, the determination of the preselected set of nodes for the plurality of subtasks is performed in parallel or in any sequential order. In this way, the determination of the preselected node can be made quickly, thereby improving the overall efficiency of the analog distribution.
For example, assuming that the task includes a subtask 1, a subtask 2, and a subtask 3 whose priorities are from high to low, the following operations may be performed in parallel during the pre-selection process: determining a preselected node set of the subtask 1 according to the resource demand information of the subtask 1 and the current resource state information of the distributed system, determining a preselected node set of the subtask 2 according to the resource demand information of the subtask 2 and the current resource state information of the distributed system, and determining a preselected node set of the subtask 3 according to the resource demand information of the subtask 3 and the current resource state information of the distributed system, assuming that the preselected node set corresponding to the obtained subtask 1 comprises { node 1, node 2, node 3}, the preselected node set corresponding to the subtask 2 comprises { node 2, node 5, node 6, node 7}, and the preselected node set corresponding to the subtask 3 comprises { node 6 and node 7 }.
In some embodiments, in a preferred process, the target nodes corresponding to the multiple subtasks may be sequentially determined from a preselected set of nodes of the multiple subtasks in a specific order, for example, based on priorities of the multiple subtasks. And updating the current state information of the target node when the target node corresponding to one subtask is determined. For example, the current state information of the target node is updated based on the resource demand information corresponding to the subtask. In this way, in the process of determining the target node of the subsequent subtasks, the part of the target node corresponding to the resource required by the subtask is unavailable, so that smooth execution of the plurality of subtasks can be ensured.
In the disclosed embodiment, the target node may be selected from a set of preselected nodes based on a policy. For example, the selection of the target node is performed based on the current resource state information of the distributed system or the current state information of each preselected node and the resource demand information of a plurality of subtasks. In some embodiments, scores of preselected nodes included in the preselected set of nodes for each subtask may be determined, for example, based on current state information of the preselected nodes in the preselected set of nodes for the subtask and resource requirement information for the subtask, scores of the preselected nodes may be determined, and a target node for each subtask may be selected from the preselected set of nodes for each subtask based on the scores of the preselected nodes included in the preselected set of nodes for each subtask. For example, the preselected node with the highest score may be selected as the target node from at least one preselected node included in the set of preselected nodes. Alternatively, the selection of the target node is performed based on the score and other factors together.
Following the previous example, taking the example of sequentially determining the target nodes of multiple subtasks based on task priority, the score of each preselected node in the preselected node set of subtask 1 may be determined first, and assuming that the scores of node 1, node 2, and node 3 are 80, 90, and 70, respectively, node 2 is determined to be the target node of subtask 1, and the current state information of node 2 is updated. Then, the score of each preselected node in the preselected node set of subtask 2 is determined, wherein the score of node 2 is determined based on the updated current state information of node 2, and assuming that the scores of node 2, node 5, node 6, and node 7 are 60, 80, 75, and 70, respectively, node 5 is determined to be the target node of subtask 2, and the current state information of node 5 is updated. Finally, the score of each preselected node in the set of preselected nodes for subtask 3 is determined, and assuming that the scores of node 6 and node 7 are 70 and 60, respectively, node 6 is determined to be the target node for subtask 3.
In some embodiments, in a preferred process, the target nodes for multiple subtasks may be determined in parallel, or combined with a preselected set of nodes for multiple subtasks to determine the target nodes for multiple subtasks synthetically. For example, a preselected set of nodes for at least a portion of the plurality of subtasks is intersected to determine target nodes for the at least a portion of subtasks, although the disclosed embodiments are not limited in this respect.
In some embodiments, the target nodes corresponding to the multiple subtasks may be determined sequentially from a preselected set of nodes for the multiple subtasks in a particular order, e.g., based on priorities of the multiple subtasks. In an optional example, each time a target node corresponding to a subtask is determined, current state information of the target node is updated. For example, the current state information of the target node is updated based on the resource demand information corresponding to the subtask. In this way, in the process of determining the target node of the subsequent subtasks, the part of the target node corresponding to the resource required by the subtask is unavailable, so that smooth execution of the plurality of subtasks can be ensured. In another alternative example, the target node of a subsequent subtask is selected based on the target node selected by a preceding subtask. For example, the following subtask preferentially selects the target node of the previous subtask as its own target node unless the target node is not included in the preselected node set of the following subtask or other key factors of the target node do not satisfy the requirement of the following subtask, but the embodiment of the present disclosure is not limited thereto.
In the case of simultaneous resource allocation for multiple subtasks, the above-mentioned pre-selection and optimization procedure may be a simulated allocation procedure, which is a virtual resource allocation procedure and not an actually executed resource allocation procedure, and is used to determine, by calculation, whether multiple subtasks can be currently allocated to a node in the distributed system. By performing the simulated distribution of the nodes for the plurality of subtasks of the task, whether the distributed system currently satisfies the resources required by the plurality of subtasks can be determined. Similarly, the updating of the current resource state information involved in the above process is a determination process for the above simulation allocation or target node, and is virtual rather than real updating, which may mean that after the node simulation allocation of a subtask is completed or the target node is determined, the virtual change condition of the current available resource in the distributed system is determined, and since the actual allocation of the subtask has not been performed yet, the current available resource in the distributed system is not actually changed.
In case of resource allocation for multiple sub-tasks asynchronously, the above pre-selection and optimization procedure may also be a real node allocation procedure. Similarly, the update of the current resource state information involved in the above process may also refer to a real update process performed after the node allocation of a subtask is completed, because the current available resource in the distributed system changes.
In addition to determining the target node in units of subtasks, the target node may be determined in units of packets. How to determine the target node corresponding to each group based on the task type of the task will be described below.
In some embodiments, a plurality of subtasks of the task may be divided into at least one group according to a task type of the task, wherein each group includes at least one subtask of the plurality of subtasks; and respectively allocating a corresponding target node to each group in at least one group of the tasks, wherein each subtask in the same group is allocated to the same target node.
The number of groups may be the same as the number of nodes required by the task, i.e. in case a number of sub-tasks of the task needs to be allocated to N target nodes, the number of sub-tasks of the task is divided into N groups. The number of the subtasks in each group may be the same or different. The grouping manner may be to randomly allocate a plurality of subtasks, or to divide subtasks with similar priorities into one group, or to divide subtasks with or without dependency relationship into one group, or to divide subtasks with larger resource demand quantity difference into one group, or to divide subtasks with the same demand resource type into one group, and the like. Thus, the target node is determined by taking the group as a unit, and the efficiency of the resource allocation process can be improved on the premise of not obviously influencing the task execution efficiency.
For a communication-intensive task, multiple subtasks of the task may be allocated to the smallest number of target nodes, so as to reduce communication cost among the subtasks, for example, all subtasks in the multiple subtasks are divided into the same group, so that all subtasks are executed on the same target node. In some optional examples, the total number of nodes required by the plurality of subtasks of the task may be determined according to the resource requirement information of the plurality of subtasks of the task; and dividing a plurality of subtasks of the task into at least one group according to the total number of the nodes. For another example, for a compute-intensive task, each of a plurality of subtasks of the task may be treated as a group, that is, the determination of the target node may be made in units of subtasks.
For example, assuming that a task includes 4 subtasks, each subtask requires one CPU, and the number of available CPUs of each node in the distributed system is 2, at least 2 nodes are required to schedule 4 subtasks in the task. Therefore, if the task is a communication intensive task, the 4 subtasks can be divided into 2 groups on average, and the grouping manner can be to group the subtasks according to the number of the subtasks, or to group the subtasks according to the resource requirement information of the subtasks. If the task is a computationally intensive task, the 4 subtasks may be divided into 4 groups, i.e., each subtask is treated as a group.
In some embodiments, in determining the target node for each group of the task, the target node for each group may be determined by a pre-selection and preferably two processes.
Specifically, during the pre-selection process, a pre-selected set of nodes for each of the at least one group may be determined based on current resource state information for the distributed system and resource demand information for the respective subtasks in each group. For example, nodes that cannot meet the requirement of the group may be removed from the nodes in the distributed system, and the remaining nodes are used as the preselected node set of the group, and the preselected node sets of different groups may be the same or different. Wherein the resource requirement information of a group is used for indicating the total resource required by each subtask in the group.
In a preferred process, the target node corresponding to each group may be selected from a preselected set of nodes for said each group. The current resource status information of the distributed system is used for characterizing the resources currently available in the distributed system, and may include information such as the number and distribution of the currently available resources. For example, a most suitable node may be selected from a preselected set of nodes for a packet as the target node for the packet. Specifically, the target node may be determined according to at least one of a resource type, a resource amount, a resource distribution, and a topology of each node in the distributed system of each preselected node included in the grouped set of preselected nodes. The resource types include, but are not limited to, at least any of: port number of the communication port, CPU type, GPU type, etc. The resource quantity can be CPU quantity, disk residual capacity, memory residual capacity and the like. The resource distribution is also referred to as a resource fragmentation condition, i.e. a distribution position of currently available resources, e.g. how many disks the remaining capacity of a disk is distributed on. The topology of each node in the distributed system may be, for example, a bus-type topology, a star-type topology, a ring topology, a tree topology, or the like.
In some embodiments, in the preselection process, the preselection node set of each group may be determined based on current resource state information of the distributed system and resource demand information of each group, that is, the preselection process of each group in at least one group of the tasks may not interfere with each other, and the preselection node set of each group is selected only in relation to the resource demand information of the group and not dependent on the resource demand information of other groups. In some embodiments, in a preferred process, the target nodes for multiple subtasks may be determined in parallel, or combined with a preselected set of nodes for multiple subtasks to determine the target nodes for multiple subtasks synthetically.
In some embodiments, the set of preselected nodes for each group may be determined based on current resource state information for the distributed system and resource demand information for each group, the target node for each group being determined from the set of preselected nodes for each group in turn in a particular order.
The above specific order may refer to a preset order. Alternatively, in some examples, the specific order may be determined according to a priority of each packet of the at least one packet, for example, a corresponding target node is determined for a packet with a higher priority, and then a corresponding target node is determined for a packet with a lower priority, where the priority may be determined based on one or more factors such as a priority of the packet, a dependency between the packet and other packets, and the like. Alternatively, the specific order may be determined based on other factors, which are not limited by the embodiments of the present disclosure.
For example, a target node is determined for a packet with a high priority, and then a target node is determined for a packet with a low priority. Wherein the priority of a packet can be determined according to the priority of each subtask in the packet. For example, the priorities of the subtasks in the packet may be weighted and averaged to obtain the priority of the packet; alternatively, the priority of the subtask with the highest priority in the packet may be taken as the priority of the packet. For another example, when packet a is dependent on packet B, the destination node is determined for packet a first and then for packet B. Wherein the dependency between the groups may be determined based on the dependency between the subtasks in the group, e.g. when a subtask dependent on a subtask in the group B is included in the group a, it is determined that the group a depends on the group B.
The determination of the sets of preselected nodes for a plurality of groups may be independent of each other, without dependency, e.g., may be performed in parallel or in any sequential order. For example, the set of preselected nodes for the plurality of groups is determined based on current resource state information for the same distributed system, i.e., the current resource state information for the distributed system is not updated during the preselection process. For example, the determination of the set of preselected nodes for each group is only related to its own resource requirement information, and not to the resource requirement information of other groups.
In some embodiments, assuming that the task includes packet 1, packet 2 and packet 3 with high to low priority, the following operations may be performed in parallel during the pre-selection process: determining a preselected node set of the group 1 according to the resource demand information of the group 1 and the current resource state information of the distributed system, determining a preselected node set of the group 2 according to the resource demand information of the group 2 and the current resource state information of the distributed system, and determining a preselected node set of the group 3 according to the resource demand information of the group 3 and the current resource state information of the distributed system. Assume that the set of preselected nodes corresponding to packet 1 includes { node 1, node 2, node 3}, the set of preselected nodes corresponding to packet 2 includes { node 2, node 5, node 6, node 7}, and the set of preselected nodes corresponding to packet 3 includes { node 5 and node 6 }.
In a preferred process, scores of preselected nodes contained in the set of preselected nodes for each group may be determined; selecting a target node for each group from the set of preselected nodes for the group based on scores of preselected nodes contained in the set of preselected nodes for the group. Further, after the target node of the packet is determined, the current resource state information of the distributed system can be updated.
Following the previous example, as shown in fig. 3, the scores of the preselected nodes in the preselected node set of group 1 may be determined first, and assuming that the scores of node 1, node 2 and node 3 are 80, 90 and 70, respectively, node 2 is determined as the target node of group 1, and the current resource status information of the distributed system is updated. Then, the scores of all the preselected nodes in the preselected node set of the group 2 are determined, and assuming that the scores of the node 2, the node 5, the node 6 and the node 7 are respectively 60, 80, 75 and 70, the node 5 is determined as the target node of the group 2, and the current resource state information of the distributed system is updated. The scores of the various preselected nodes in the set of preselected nodes for group 3 are then determined, and assuming that the scores for node 5 and node 6 are 80 and 60, respectively, then node 5 is determined to be the target node for group 3.
In practical applications, the pre-selection process may not be performed, and the optimization process may be performed directly, that is, the target node corresponding to each group is selected from the nodes in the distributed system. Or a preselection process is firstly carried out to obtain a preselection node set of the subtasks, and then a node is randomly selected from the preselection node set of the subtasks as a target node. Other ways to determine the target node may also be used, and are not described herein.
In some embodiments, in the case that a corresponding target node is successfully determined for each of a plurality of subtasks of the task, a target node is allocated for the plurality of subtasks included in the task. And under the condition that the corresponding target node is determined to be unsuccessful for at least one of the plurality of subtasks, performing delay distribution on the plurality of subtasks of the task.
In some embodiments, in the event that a corresponding target node is successfully determined for each of at least one packet of the task, a target node is assigned to each packet. And under the condition that the corresponding target node is determined to be unsuccessful for at least one group of the task, performing delay distribution on each group of the at least one group of the task.
After allocating corresponding target nodes to the multiple subtasks of the task, the multiple subtasks of the task may be synchronously scheduled. It should be noted that, in the embodiment of the present disclosure, allocating refers to distributing a sub-task to a corresponding target node, and scheduling refers to executing the sub-task by the corresponding target node after distributing the sub-task to the corresponding target node.
As shown in fig. 4, in the conventional scheduling method, the scheduler does not distinguish the task types, and the assignment method for each task is the same, that is, a plurality of subtasks in the task are processed in sequence, and when one subtask can be scheduled, the subtask is immediately assigned to the corresponding target node, and the resource on the target node is occupied to schedule the subtask. In practical situations, tasks of different task types are often suitable for different distribution modes.
In the embodiment of the disclosure, the sub-task allocation and scheduling are performed according to the task type. As shown in fig. 5, all tasks are still obtained according to priorities, for each task, the whole task is considered overall, corresponding target nodes are allocated to a plurality of subtasks of the task according to the task type of the task, and then the plurality of subtasks are scheduled synchronously.
Different from a traditional allocation mode, the embodiment of the disclosure takes the whole task into consideration when selecting the nodes for the task, allocates the target nodes for a plurality of subtasks included in the task according to the task type of the task, and can allocate different target nodes for tasks of different task types, so that the allocated target nodes are adapted to the task type, and at least one of the task operation efficiency and the cluster resource utilization rate is improved.
As shown in fig. 6, is a schematic diagram of scheduling logic of some embodiments of the present disclosure. To reduce the scheduling complexity, the following assumptions can be made for each task to be scheduled:
(1) each task has a plurality of subtasks;
(2) each subtask is homogeneous in resource demand, i.e., the demand for resources is the same;
(3) the task submitted by each user specifies the resource demand of each subtask during submission, and the number of subtasks and the resource demand of each subtask remain unchanged during the whole task processing process.
Firstly, a pre-selection process is carried out, and nodes which cannot meet the task of a user are removed. Specifically, the method comprises the following steps:
(1) for each subtask of a task, a preselected set of nodes for the subtask is determined, and nodes outside the preselected set of nodes for the subtask are necessarily incapable of supporting the subtask to run on. Herein, a task may be referred to as a job, and a subtask may be referred to as a task.
In some alternative examples, the preselected nodes for a subsequent subtask may be determined from a preselected set of nodes for a preceding subtask, where the order of the subtasks may optionally be determined based on the priority of the subtasks. For example, assuming that the distributed system includes nodes 1 to 5 and one task includes subtask 1, subtask 2, and subtask 3, a preselected node may be determined for the subtask 1, and it is assumed that the node is node 1, node 2, and node 3; then, determining preselected nodes of the subtask 2 from the node 1, the node 2 and the node 3, and assuming the preselected nodes as the node 2 and the node 3; and then determining a preselected node of the subtask 3 from the node 2 and the node 3, and assuming the node 3. By means of the mode for determining the preselected node set, for tasks with a certain amount of communication requirements, especially for communication intensive tasks, the efficiency for determining the preselected node set can be improved, and therefore the task distribution efficiency is improved.
(2) And verifying each preselected node in the preselected node set to determine whether the total resource on the preselected node meets the total resource requirement of all subtasks distributed to the preselected node. If so, the preferred process is carried out, otherwise, the pre-selection process is carried out again.
Wherein this step is optional, the verification of the preselected node may improve the reliability of the overall task assignment. Before the preselection process is performed, node set segmentation may also be performed, that is, each node is grouped, and the grouping may be based on the cluster to which the node belongs, that is, the nodes on different clusters are divided into different groups. By performing node set partitioning, different tasks can be assigned to only nodes of a particular group.
Then, a preferred process is carried out, and a target node which is most suitable for the task of the user is selected. Specifically, the following steps may be included:
(1) and acquiring the calculated amount and the communication amount of the task in the training process through the information provided by the user in the user task and the historical experience of the scheduler, and judging whether the task of the user is calculation intensive or communication intensive according to the calculated amount and the communication amount.
In some alternative examples, the determination of the task type may also be performed during or prior to the above pre-selection process, and the determination of the pre-selected set of nodes for the subtasks is made based on the task type.
In some alternative examples, the determination of the plurality of subtask target nodes in the preference process is made based on the task type. In the case of a communication-intensive task, the task is placed (i.e., allocated) to a minimum number of physical nodes (i.e., target nodes), the number of physical nodes to be allocated is determined by the available resources of the nodes and the resources required by the task, and if the placement constraint cannot be met, the allocation is delayed for a plurality of subtasks in the task. If the task is a calculation intensive task, the placement constraint is relaxed, and even if the minimum physical nodes cannot be met, the node with enough resources can be selected to meet the task requirement to be distributed, otherwise, the distribution is delayed.
(2) And sorting the cluster nodes according to the spare resources from big to small to obtain the mapping value of the preselected node set to the task, wherein the value of the preselected node can be determined according to the node load and the node resource fragment condition in the cluster of the distributed system. The node load refers to the total amount of currently available resources and the resource usage amount of each node in the cluster at the current moment. The resource fragmentation case refers to the distribution of resources on a node.
(3) And sequentially placing the tasks of all the nodes selected in the last step from high scores to low scores until the tasks cannot be placed or all the tasks of the tasks are placed.
(4) And checking the task state. If the task has an empty task which is not placed, the cluster resources are not enough, and the distribution fails. The resources occupied by the task are released and allocation is delayed. If the job has no free task and the task type is communication intensive, it is determined whether the minimum node requirement is met. If the allocation is successful, otherwise, the allocation is failed, the resources are released, and the allocation is delayed.
The method of the embodiment of the disclosure can be applied to a task arranging system, and the task arranging system can be realized based on cloud platforms such as Kubernets and the like.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
As shown in fig. 7, an embodiment of the present disclosure further provides a task scheduling apparatus, where the apparatus includes:
a determining module 701, configured to determine a task type of a task;
an allocating module 702, configured to allocate target nodes to multiple subtasks included in the task according to the task type of the task.
In some embodiments, the task type of the task is compute intensive or communication intensive.
In some embodiments, the assignment module is to: and determining a target node corresponding to each subtask in the plurality of subtasks according to the task type of the task and the resource demand information of the plurality of subtasks, and distributing the determined corresponding target node to the plurality of subtasks.
In some embodiments, the assignment module is to: and determining target nodes for the plurality of subtasks in sequence based on the task types of the tasks, wherein after the target node of each subtask is determined, current resource state information of the distributed system is updated based on the resource demand information of the subtask.
In some embodiments, the assignment module is to: determining a preselected set of nodes for each of the plurality of subtasks, and selecting a target node for each of the subtasks from the preselected set of nodes for each of the plurality of subtasks based on a task type of the task.
In some embodiments, a preselected set of nodes for each of the plurality of subtasks is determined based on the resource requirement information for the plurality of subtasks and the task type of the task.
In some embodiments, the target node for each of the plurality of subtasks is selected from a preselected set of nodes for each of the subtasks in turn in an order.
Optionally, the order is determined based on at least one of a priority of the subtasks, a dependency between the subtasks.
In some embodiments, the assignment module is to: and determining a preselected node set of the current subtask from preselected node sets of subtasks before the current subtask.
In some embodiments, the assignment module comprises: the first determining unit is used for determining scores of preselected nodes contained in a preselected node set of each subtask according to the task type of the task; and the selecting unit is used for selecting the target node of each subtask from the preselected node set of each subtask based on the score of the preselected node contained in the preselected node set of each subtask.
In some embodiments, the assignment module comprises: the grouping unit is used for dividing a plurality of subtasks of the task into at least one group according to the task type of the task, wherein each group comprises at least one subtask in the plurality of subtasks; and the first allocation unit is used for allocating a corresponding target node to each group in at least one group of the tasks respectively, wherein each subtask in the same group is allocated to the same target node.
In some embodiments, the grouping unit is to: under the condition that the task type of the task is communication intensive, determining the total number of nodes required by a plurality of subtasks of the task according to the resource demand information of the plurality of subtasks, and dividing the plurality of subtasks of the task into at least one group according to the total number of the nodes; and/or, in the event that the task type of the task is computationally intensive, treating each of a plurality of subtasks of the task as a group.
In some embodiments, the first distribution unit comprises: a determining subunit, configured to determine a preselected node set of each of the at least one group based on current resource state information of the distributed system and resource demand information of each subtask in each group; and the selecting subunit is used for selecting the target node corresponding to each group from the preselected node set of each group.
In some embodiments, the determining subunit is to: determining a preselected set of nodes for each group based on current resource state information for the distributed system and resource demand information for the group; the selection subunit is configured to: the target node for each group is determined from the preselected set of nodes for each group in turn in a particular order.
In some embodiments, the selection subunit is to: determining a score for at least one preselected node contained in the set of preselected nodes for each group based on the task type for the task; selecting a target node for each group from the set of preselected nodes for the group based on a score of at least one preselected node contained in the set of preselected nodes for the group.
Wherein the set of preselected nodes for different groups may be the same or different. In some examples, multiple packets may have the same set of preselected nodes.
In some embodiments, the assignment module comprises: the second determining unit is used for determining target nodes for a plurality of subtasks of the task according to the task type of the task; and the second distributing unit is used for distributing the target nodes to the plurality of subtasks included in the task under the condition that the corresponding target nodes are successfully determined for the plurality of subtasks of the task.
In some embodiments, the apparatus further comprises: and the delay distribution module is used for performing delay distribution on the plurality of subtasks of the task under the condition that the corresponding target node is not successfully determined for at least one subtask in the plurality of subtasks.
In some embodiments, the assignment module is to: and according to the task type of the task, distributing target nodes for part of subtasks in the plurality of subtasks in the current resource distribution process or the current scheduling period.
In some embodiments, the assignment module is to: and respectively determining target nodes corresponding to the plurality of subtasks according to the task types of the tasks, and under the condition that the first subtask in the plurality of subtasks does not have the corresponding target node, allocating the corresponding target node to other subtasks of the plurality of subtasks, and performing delayed allocation on the first subtask.
In some embodiments, the apparatus further comprises: and the scheduling module is used for synchronously scheduling the plurality of subtasks of the task after distributing target nodes to the plurality of subtasks included in the task according to the task type.
In some embodiments, the apparatus is applied to a task orchestration system.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.
The embodiments of the apparatus of the present specification can be applied to a computer device, such as a server or a terminal device. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a device in a logical sense, a processor in which the device is located processes files reads corresponding computer program instructions in the nonvolatile memory into the memory, and then reads the computer program instructions from the memory into the processor to run. From a hardware aspect, as shown in fig. 8, the hardware structure of the computer device in which the apparatus of this specification is located is shown in fig. 8, except for the processor 801, the memory 802, the network interface 803, and the nonvolatile memory 804 shown in fig. 8, a server or an electronic device in which the apparatus is located in the embodiment may also include other hardware according to an actual function of the computer device, which is not described again.
Accordingly, the embodiments of the present disclosure also provide a computer storage medium on which a computer program is stored, which when executed by a processor implements the method according to any of the embodiments.
Accordingly, embodiments of the present disclosure also provide a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method according to any of the embodiments when executing the program.
The present disclosure may take the form of a computer program product embodied on one or more storage media including, but not limited to, disk storage, CD-ROM, optical storage, and the like, having program code embodied therein. Computer-usable storage media include permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable commands, data structures, modules of a program, or other data. Examples of the storage medium of the computer include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
The above description is only exemplary of the present disclosure and should not be taken as limiting the disclosure, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.
The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.
Claims (14)
1. A method for task scheduling, the method comprising:
determining the task type of the task;
and distributing target nodes for a plurality of subtasks included in the task according to the task type of the task.
2. The method of claim 1, wherein the task type of the task is compute intensive or communication intensive.
3. The method according to claim 1 or 2, wherein the allocating target nodes to a plurality of subtasks included in the task according to the task type of the task comprises:
dividing a plurality of subtasks of the task into at least one group according to the task type of the task, wherein each group comprises at least one subtask in the plurality of subtasks;
and respectively allocating a corresponding target node to each group in at least one group of the tasks, wherein each subtask in the same group is allocated to the same target node.
4. The method of claim 3, wherein the dividing the plurality of subtasks of the task into at least one group according to the task type of the task comprises:
under the condition that the task type of the task is communication intensive, determining the total number of nodes required by a plurality of subtasks of the task according to the resource demand information of the plurality of subtasks, and dividing the plurality of subtasks of the task into at least one group according to the total number of the nodes;
and/or
In a case where the task type of the task is computationally intensive, each of a plurality of subtasks of the task is treated as a group.
5. The method according to claim 3 or 4, wherein said respectively assigning a corresponding target node to each of at least one group of said tasks comprises:
determining a preselected set of nodes for each of the at least one group based on current resource state information for the distributed system and resource demand information for each subtask in each group;
and selecting the target node corresponding to each group from the preselected node set of each group.
6. The method of claim 5, wherein determining the preselected set of nodes for each of the at least one group based on current resource state information for the distributed system and resource demand information for the respective subtasks in each group comprises:
determining a preselected set of nodes for each group based on current resource state information for the distributed system and resource demand information for the group;
the selecting the target node corresponding to each group from the preselected node set of each group comprises:
determining a target node for each of the plurality of packets in turn from the preselected set of nodes for each of the plurality of packets in a particular order.
7. The method of claim 5 or 6, wherein selecting the target node corresponding to each group from the pre-selected set of nodes for each group comprises:
determining a score for at least one preselected node contained in the set of preselected nodes for each group based on the task type for the task;
selecting a target node for each group from the set of preselected nodes for the group based on a score of at least one preselected node contained in the set of preselected nodes for the group.
8. The method according to any one of claims 1 to 7, wherein the allocating target nodes to a plurality of subtasks included in the task according to the task type of the task comprises:
determining target nodes for a plurality of subtasks of the task according to the task type of the task;
and under the condition that the corresponding target nodes are successfully determined for the plurality of subtasks of the task, distributing the target nodes for the plurality of subtasks included in the task.
9. The method of claim 8, further comprising:
and under the condition that the corresponding target node is determined to be unsuccessful for at least one of the plurality of subtasks, performing delay distribution on the plurality of subtasks of the task.
10. The method according to any one of claims 1 to 9, further comprising:
and after target nodes are distributed to the plurality of subtasks included in the task according to the task type, synchronously scheduling the plurality of subtasks of the task.
11. The method according to any one of claims 1 to 10, wherein the allocating target nodes for a plurality of subtasks included in the task based on the task type of the task comprises:
determining a preselected set of nodes for each of the plurality of subtasks, the preselected set of nodes including at least one preselected node;
and selecting a target node corresponding to each subtask from a preselected node set of each subtask based on the task type of the task.
12. A task scheduling apparatus, characterized in that the apparatus comprises:
the determining module is used for determining the task type of the task;
and the distribution module is used for distributing target nodes for a plurality of subtasks included in the task according to the task type of the task.
13. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 11.
14. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 11 when executing the program.
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010165543.4A CN113391886A (en) | 2020-03-11 | 2020-03-11 | Task scheduling method and device |
KR1020217038822A KR20220002547A (en) | 2020-03-11 | 2021-03-09 | Task Scheduling Method and Apparatus |
PCT/CN2021/079810 WO2021180092A1 (en) | 2020-03-11 | 2021-03-09 | Task dispatching method and apparatus |
JP2021570920A JP2022539955A (en) | 2020-03-11 | 2021-03-09 | Task scheduling method and apparatus |
TW110108474A TWI786564B (en) | 2020-03-11 | 2021-03-10 | Task scheduling method and apparatus, storage media and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010165543.4A CN113391886A (en) | 2020-03-11 | 2020-03-11 | Task scheduling method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113391886A true CN113391886A (en) | 2021-09-14 |
Family
ID=77615517
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010165543.4A Pending CN113391886A (en) | 2020-03-11 | 2020-03-11 | Task scheduling method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113391886A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118193177A (en) * | 2024-05-20 | 2024-06-14 | 济南浪潮数据技术有限公司 | Task scheduling method, system, program product, device and medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160098292A1 (en) * | 2014-10-03 | 2016-04-07 | Microsoft Corporation | Job scheduling using expected server performance information |
CN106502791A (en) * | 2016-10-14 | 2017-03-15 | 浪潮电子信息产业股份有限公司 | A kind of method for allocating tasks and device |
CN107291545A (en) * | 2017-08-07 | 2017-10-24 | 星环信息科技(上海)有限公司 | The method for scheduling task and equipment of multi-user in computing cluster |
CN109309726A (en) * | 2018-10-25 | 2019-02-05 | 平安科技(深圳)有限公司 | Document generating method and system based on mass data |
CN109840149A (en) * | 2019-02-14 | 2019-06-04 | 百度在线网络技术(北京)有限公司 | Method for scheduling task, device, equipment and storage medium |
CN109995817A (en) * | 2017-12-29 | 2019-07-09 | 中移信息技术有限公司 | A kind of service scheduling method and device |
CN110187960A (en) * | 2019-04-23 | 2019-08-30 | 广东省智能制造研究所 | A kind of distributed resource scheduling method and device |
CN110618865A (en) * | 2019-09-20 | 2019-12-27 | 中国银行股份有限公司 | Hadoop task scheduling method and device |
-
2020
- 2020-03-11 CN CN202010165543.4A patent/CN113391886A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160098292A1 (en) * | 2014-10-03 | 2016-04-07 | Microsoft Corporation | Job scheduling using expected server performance information |
CN106502791A (en) * | 2016-10-14 | 2017-03-15 | 浪潮电子信息产业股份有限公司 | A kind of method for allocating tasks and device |
CN107291545A (en) * | 2017-08-07 | 2017-10-24 | 星环信息科技(上海)有限公司 | The method for scheduling task and equipment of multi-user in computing cluster |
CN109995817A (en) * | 2017-12-29 | 2019-07-09 | 中移信息技术有限公司 | A kind of service scheduling method and device |
CN109309726A (en) * | 2018-10-25 | 2019-02-05 | 平安科技(深圳)有限公司 | Document generating method and system based on mass data |
CN109840149A (en) * | 2019-02-14 | 2019-06-04 | 百度在线网络技术(北京)有限公司 | Method for scheduling task, device, equipment and storage medium |
CN110187960A (en) * | 2019-04-23 | 2019-08-30 | 广东省智能制造研究所 | A kind of distributed resource scheduling method and device |
CN110618865A (en) * | 2019-09-20 | 2019-12-27 | 中国银行股份有限公司 | Hadoop task scheduling method and device |
Non-Patent Citations (1)
Title |
---|
张丽晓, 袁立强, 徐炜民: "基于任务类型的集群调度策略", 计算机工程, no. 13, 5 January 2005 (2005-01-05) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118193177A (en) * | 2024-05-20 | 2024-06-14 | 济南浪潮数据技术有限公司 | Task scheduling method, system, program product, device and medium |
CN118193177B (en) * | 2024-05-20 | 2024-09-24 | 济南浪潮数据技术有限公司 | Task scheduling method, system, program product, device and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI786564B (en) | Task scheduling method and apparatus, storage media and computer equipment | |
CN112416585B (en) | Deep learning-oriented GPU resource management and intelligent scheduling method | |
CN113391914A (en) | Task scheduling method and device | |
CN107222531B (en) | Container cloud resource scheduling method | |
CN109564528B (en) | System and method for computing resource allocation in distributed computing | |
CN113672391B (en) | Parallel computing task scheduling method and system based on Kubernetes | |
CN114356543A (en) | Kubernetes-based multi-tenant machine learning task resource scheduling method | |
US20210390405A1 (en) | Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof | |
CN114787830A (en) | Machine learning workload orchestration in heterogeneous clusters | |
CN114625500B (en) | Topology-aware microservice application scheduling method and application in cloud environment | |
CN115237580B (en) | Intelligent calculation-oriented flow parallel training self-adaptive adjustment system and method | |
CN113225269B (en) | Container-based workflow scheduling method, device and system and storage medium | |
US20230037293A1 (en) | Systems and methods of hybrid centralized distributive scheduling on shared physical hosts | |
CN116010064A (en) | DAG job scheduling and cluster management method, system and device | |
CN113391886A (en) | Task scheduling method and device | |
CN114968601A (en) | Scheduling method and scheduling system for AI training jobs with resources reserved according to proportion | |
CN113626173A (en) | Scheduling method, device and storage medium | |
WO2016118164A1 (en) | Scheduler-assigned processor resource groups | |
CN114265676B (en) | Cluster resource scheduling method, device, equipment and medium | |
CN112416538A (en) | Multilayer architecture and management method of distributed resource management framework | |
US20240220794A1 (en) | Training systems and operating method thereof | |
WO2024230492A1 (en) | Communication scheduling method for distributed system, and distributed machine learning system | |
CN111813527B (en) | Data-aware task scheduling method | |
KR102563374B1 (en) | Method and system for scheduling distributed deep learning task in shared gpu clusters | |
CN117891584B (en) | Task parallelism scheduling method, medium and device based on DAG grouping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |