CN113535332B

CN113535332B - Cluster resource scheduling method and device, computer equipment and storage medium

Info

Publication number: CN113535332B
Application number: CN202110921010.9A
Authority: CN
Inventors: 李亚坤; 张云尧
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2024-06-18
Anticipated expiration: 2041-08-11
Also published as: CN113535332A

Abstract

The disclosure relates to a method, a device, a computer device and a storage medium for scheduling cluster resources, wherein each node in a cluster comprises a plurality of computing units and adopts a non-uniform memory access architecture, and the method comprises the following steps: receiving the residual resource quantity and unit identifiers of each computing unit reported by each node; receiving a resource scheduling request of a target job, wherein the resource scheduling request comprises the amount of resources required by a task of the target job; determining a target computing unit, wherein the number of the residual resources in the computing units is not less than the number of the resources required by the task, acquiring a first node identifier of a first target node where the target computing unit is located based on the unit identifier of the target computing unit, and determining a resource allocation result of the task; and based on the first node identification, sending a resource allocation result to the first target node so as to bind the task to the resource of the target computing unit, and executing the task on the target computing unit. Thus, the situation of accessing the memory across the computing units is avoided, and the overall performance of task execution is improved.

Description

Cluster resource scheduling method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a resource scheduling method, a resource scheduling device, a computer device for implementing the resource scheduling method, and a computer readable storage medium.

Background

The cluster scheduling system is a basic platform for managing and scheduling resources of a cluster, and supports scheduling various tasks in a large-scale cluster, such as HadoopYARN (HadoopYet Another Resource Negotiator, a resource manager) and the like. Typically, the yan cluster scheduling system includes a resource manager RM (Resource Manager) and a plurality of node managers NM (Node Manager), where RM is responsible for the resource management and allocation of the entire cluster, and NM is the resource and task manager on each node, which is the agent that manages the node. The NM periodically reports the usage of the resources (such as CPU, memory, etc.) of the node and the running state of the resource Container (Container) to the RM. After a user submits a job through a Client (Client), the RM creates a corresponding application manager AM (ApplicationMaster) for the job, an AM corresponds to a specific subtask of each job, and is responsible for managing the job, one job generally includes a plurality of subtasks, the AM submits a resource application to the RM for the subtask application resource of the job, and after the RM allocates a resource, the AM communicates with the NM of the node where the allocated resource is located, so as to execute the subtask on the corresponding node.

In the related art, a node in the cluster scheduling system may adopt a NUMA (Non-Uniform MemoryAccess ) architecture, under which each node is divided into a plurality of computing units (also referred to as sockets), each computing unit has its own independent CPU and memory, i.e. a local memory, and each computing unit is connected with each other. The CPU in one computing unit may access local memory as well as memory in any other computing unit, i.e. there may be situations where memory is accessed across computing units.

When a job has a task execution requirement, an AM applies resources to an RM, informs the RM of the amount of resources required by the RM to execute the task, the RM receives the residual amount of resources of each node reported by NMs of all nodes, and if the residual amount of resources of one node meets the amount of resources required by executing the task, the RM dispatches the task to the corresponding node to execute. However, since there are multiple computing units on the node in the NUMA architecture, each computing unit has its own independent CPU and memory, when the task of the job runs, a large amount of memory accesses across computing units easily occur inside the node, which may cause a serious decrease in the overall performance of task execution.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a cluster resource scheduling method, a cluster resource scheduling device, and a computer device and a computer readable storage medium for implementing the cluster resource scheduling method, so as to reduce or even avoid a situation that a cluster accesses a memory across computing units inside a node when scheduling resources for tasks of a job, thereby improving overall performance of task execution.

In a first aspect, the present disclosure provides a method for scheduling cluster resources, where each node in the cluster includes a plurality of computing units and adopts a non-uniform memory access architecture, the method includes:

Receiving the residual resource quantity and unit identification of each computing unit in the nodes reported by each node in the cluster;

receiving a resource scheduling request of a target job, wherein the resource scheduling request comprises the quantity of resources required by a task of the target job;

Determining a target computing unit of which the number of remaining resources in the plurality of computing units is not smaller than the number of resources required by the task, based on the number of remaining resources in each computing unit of each node and the number of resources required by the task;

Acquiring a first node identifier of a first target node where the target computing unit is located based on the unit identifier of the target computing unit, and determining a resource allocation result of the task;

and based on the first node identification, sending the resource allocation result to the first target node so as to bind the task to the resource of the target computing unit, and then executing the task on the target computing unit.

Optionally, in some embodiments of the disclosure, before the determining the resource allocation result of the task, the method further includes:

Determining preset performance parameters of each node, wherein the preset performance parameters are used for representing the capacity of the node for accessing the memory across the computing unit;

And returning to the step of determining a target computing unit, of which the number of remaining resources is not smaller than the number of resources required by the task, from among the plurality of computing units based on the number of remaining resources of each computing unit of each node and the number of resources required by the task when each preset performance parameter is smaller than or equal to a preset performance parameter threshold.

Optionally, in some embodiments of the disclosure, the step of determining a preset performance parameter of each node includes:

receiving unit parameters of the computing units on the nodes reported by the nodes, wherein the unit parameters comprise any one or more of the total number of the computing units on the nodes, the resource types in each computing unit and the total number of the resources;

And determining preset performance parameters corresponding to the nodes based on the unit parameters corresponding to the nodes, wherein the unit parameters and the preset performance parameters are in positive correlation.

Optionally, in some embodiments of the disclosure, the method further comprises:

when each preset performance parameter is larger than the preset performance parameter threshold, determining a second target node with the number of the residual resources in each node not smaller than the number of the resources required by the task based on the number of the resources required by the task and the number of the residual resources of each node;

acquiring a second node identifier of the second target node;

based on the second node identification, a node manager of the second target node is notified to launch a corresponding resource container to perform the task.

starting timing when the target computing units, the number of the residual resources of which is not less than the number of the resources required by the task, are not determined;

and when the timing duration is longer than the preset duration, determining a target computing unit with the number of the residual resources in the computing units not smaller than the number of the resources required by the task based on the number of the residual resources in the computing units of the nodes at the current moment and the number of the resources required by the task.

and when a plurality of target computing units are determined, selecting any one of the target computing units.

In a second aspect, an embodiment of the present disclosure provides a cluster resource scheduling apparatus, where each node in the cluster includes a plurality of computing units and adopts a non-uniform memory access architecture, the apparatus includes:

The data receiving module is used for receiving the residual resource quantity and the unit identification of each computing unit in the nodes reported by each node in the cluster;

a request receiving module, configured to receive a resource scheduling request of a target job, where the resource scheduling request includes a number of resources required by a task of the target job;

A unit determining module configured to determine a target computing unit, of a plurality of computing units, of which the number of remaining resources is not smaller than the number of resources required for the task, based on the number of remaining resources of each computing unit of each node and the number of resources required for the task;

The resource allocation module is used for acquiring a first node identifier of a first target node where the target computing unit is located based on the unit identifier of the target computing unit, and determining a resource allocation result of the task;

And the task execution module is used for sending the resource allocation result to the first target node based on the first node identification so as to bind the task to the resource of the target computing unit and then executing the task on the target computing unit.

Optionally, in some embodiments of the disclosure, the apparatus further comprises:

The performance parameter determining module is used for determining preset performance parameters of the nodes, wherein the preset performance parameters are used for representing the capacity of the nodes to access the memory across the computing unit;

the unit determining module is further configured to determine, when each of the preset performance parameters is less than or equal to a preset performance parameter threshold, a target computing unit, of the plurality of computing units, that is, the number of remaining resources is not less than the number of resources required by the task, based on the number of remaining resources of each computing unit of each of the nodes and the number of resources required by the task.

In a third aspect, embodiments of the present disclosure provide a computer device comprising a processor and a memory;

Wherein the memory stores a computer program executable by the processor;

The processor implements the cluster resource scheduling method according to any embodiment of the first aspect when executing the computer program.

In a fourth aspect, embodiments of the present disclosure provide a computer readable storage medium storing a computer program, which when invoked and executed by a processor, implements the cluster resource scheduling method according to any one of the embodiments of the first aspect.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:

the method, the device, the computer equipment and the storage medium for scheduling group resources provided by the embodiment of the disclosure receive the residual resource quantity and the unit identification of each computing unit in the nodes reported by each node in the cluster; receiving a resource scheduling request of a target job, wherein the resource scheduling request comprises the quantity of resources required by a task of the target job; determining a target computing unit of which the number of remaining resources in the plurality of computing units is not smaller than the number of resources required by the task, based on the number of remaining resources in each computing unit of each node and the number of resources required by the task; acquiring a first node identifier of a first target node where the target computing unit is located based on the unit identifier of the target computing unit, and determining a resource allocation result of the task; and based on the first node identification, sending the resource allocation result to the first target node so as to bind the task to the resource of the target computing unit, and then executing the task on the target computing unit. Therefore, after the first target node and the target computing units on the first target node meeting the number of resources required by task execution are determined, the task can be bound to the resources in the target computing units on the first target node, so that the task is executed in the target computing units, and when the resources are scheduled for the task of the job, the situation that the internal cross computing units of the node access the memory is avoided, and the overall performance of the task execution can be improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

Fig. 1 is a flow chart of a resource scheduling method according to an embodiment of the disclosure;

FIG. 2 is a flow chart of a resource scheduling method according to another embodiment of the disclosure;

FIG. 3 is a flow chart of a resource scheduling method according to another embodiment of the present disclosure;

FIG. 4 is a flowchart of a resource scheduling method according to another embodiment of the present disclosure;

fig. 5 is a schematic diagram of data interaction for resource scheduling under the system structure of the Hadoop yan according to the embodiment of the disclosure;

Fig. 6 is a schematic structural diagram of a resource scheduling device according to an embodiment of the disclosure;

Fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.

In order to avoid the problem that a cluster scheduling system such as YARN accesses memory across computing units in a node where a resource is located when scheduling the resource for a job task, and improve the overall performance of task execution, the embodiment of the disclosure provides a resource scheduling method, a resource scheduling device, computer equipment and a computer readable storage medium. Next, a description will be first given of a resource scheduling method provided by an embodiment of the present disclosure.

The resource scheduling method provided by the embodiment of the disclosure can be applied to a cluster scheduling system, wherein the cluster scheduling system refers to a basic platform with cluster resource management and scheduling functions, and can be Hadoop YARN, K8s, docker Swarm and the like. Taking YARN as an example, a cluster scheduling system mainly includes RM and multiple NM, which are resource and task managers on each node. In some scenarios, the nodes in the cluster adopt a NUMA architecture, each node is divided into multiple computing units, and each computing unit has its own independent resources such as CPU, memory, and other types of resources. As shown in fig. 1, a resource scheduling method provided by an embodiment of the present disclosure may include the following steps:

Step S101: and receiving the residual resource quantity and the unit identification of each computing unit in the nodes reported by each node in the cluster.

Specifically, the resource manager RM receives the remaining resource amount and the unit identifier of each computing unit of each node reported by the node manager NM of each node.

For example, there are N nodes 1 to N in the cluster, each node has M (m+.2) computing units, each computing unit on each node has its own unit identifier and independent available resources such as CPU, memory, network card, GPU, etc.

For example, assume node 1 has a total of 3 computing units (A, B, C), and the independently available resources of each computing unit (A, B, C) include, but are not limited to, CPU 10 cores, 100GB of memory, 2 GPU cards, 1 network card, and the like. Each computing unit (A, B, C) has its own unit identifier such as ID, for example, the unit identifier such as ID of computing unit a is 11, the unit identifier such as ID of computing unit B is 12, the unit identifier such as ID of computing unit C is 13, and this is only illustrative, and specific values of IDs are not limited in this embodiment, so long as different computing units can be identified separately.

The remaining number of resources S of any one computing unit is obtained by subtracting the number of allocated resources Y from the total number of available resources X of that computing unit, i.e. s= (X-Y). For example, the total number of independently available resources X of the computing unit a includes a core of the CPU 10 and a memory of 100GB, and the number of allocated resources Y of the computing unit a currently includes a core of the CPU 4 and a memory of 60GB, and the number of remaining resources S of the computing unit a includes a core of the CPU 6 and a memory of 40GB.

Specifically, the NM of each node generally reports the available resource information of each node, such as the remaining resource quantity of the node, the running state information of the resource Container, and the like, to the RM through heartbeat. In this embodiment, the NM of each node also reports the remaining resource number and unit identifier, such as ID, of each computing unit of each node to the RM through heartbeat.

Step S102: a resource scheduling request of a target job is received, the resource scheduling request including an amount of resources required by a task of the target job.

Specifically, when the RM receives and responds to the resource scheduling request to allocate resources for the task of the target job, the RM acquires the number of resources required by the task in the resource scheduling request.

Step S103: a target computing unit of a plurality of the computing units, the number of remaining resources of which is not less than the number of resources required for the task, is determined based on the number of remaining resources of each of the computing units of each of the nodes and the number of resources required for the task.

Specifically, the RM determines, based on the number of remaining resources of each computing unit of each node and the number of resources required by the task, a target computing unit whose number of remaining resources of the computing unit is not smaller than the number of resources required by the task.

When a user submits a target job to the RM through the Client, the number of resources required for executing the job, such as CPU 5 core and memory 10G, is configured. After submitting a job, the RM creates a corresponding AM for the job, which is responsible for managing the job, one job typically includes a plurality of subtasks, and the AM submits a resource scheduling request to the RM for the subtask application resources of the job. When the RM allocates resources for the subtasks of the job, the resource quantity required by the subtasks such as a CPU 5 core and a memory 10G is obtained, and then a target computing unit, of which the residual resource quantity is not less than the resource quantity required by the subtasks, is determined based on the residual resource quantity of each computing unit of each node reported by NM of each node.

For example, if the remaining resource amount of the computing unit a on the current node 1 includes the CPU 6 core and the memory 40GB, which is significantly larger than the resource amount required by the subtasks, such as the CPU 5 core and the memory 10G, the computing unit a may be determined to be the target computing unit. For example, the remaining resources of a certain node are composed of two computing units, and the number of the remaining resources of each computing unit is CPU 3 cores and memory 6GB, where there is no requirement that the number of the remaining resources of any one computing unit satisfies the number of resources required by the subtasks, such as CPU 5 cores and memory 10G, and the resources may not be allocated, or the resources may be reallocated after waiting for a certain period of time, or the allocated resources may be abandoned after waiting for a certain period of time and not satisfying yet.

Step S104: and acquiring a first node identifier of a first target node where the target computing unit is located based on the unit identifier of the target computing unit, and determining a resource allocation result of the task.

Specifically, after determining the target computing unit, such as computing unit a on node 1, the RM may obtain the unit identity, such as ID "11", of computing unit a, i.e., obtain the target unit identity. The first node identity, e.g., the name, IP address, etc., of the first target node, e.g., node 1, where the target computing unit, e.g., computing unit a, is located, may then be obtained. After that, the RM may return the acquired unit identifications such as ID "11" and the first node identification such as the name, IP address, etc. of the node 1 as resource allocation results to the AM corresponding to the job.

Step S105: and based on the first node identification, sending the resource allocation result to the first target node so as to bind the task to the resource of the target computing unit, and then executing the task on the target computing unit.

Specifically, the AM sends a resource allocation result to the NM of the first target node based on the first node identifier, so as to bind the task to a resource in the target computing unit on the first target node, and then execute the task on the target computing unit.

Specifically, the AM generates task scheduling information based on the first node identification such as the name of node 1, the IP address, and the unit ID "11" of the target computing unit such as computing unit a. Under the native scheduling, node information such as node identification needs to be attached to the scheduling information, and in this embodiment, in addition to the node information such as node identification, an element identification of an additional target computing element needs to be attached to indicate which specific computing element on the node 1 the allocated resource is on. The AM sends task scheduling information to the NM of node 1 based on the first node identification, such as the name, IP address of node 1, the task scheduling information instructing the NM of node 1 to bind the subtask to a resource in a target computing unit, such as computing unit a, on node 1, and then execute the subtask on computing unit a.

In one embodiment, the subtasks are bound to the resources of a corresponding target computing unit, such as computing unit a, and specifically may be bound to the corresponding CPU, memory, GPU, and network card, for example, of computing unit a. For example, the unit ID "11" of the computing unit a corresponds to the CPU ID of 0-9 and the memory ID of 0, and then the subtasks may be bound to the 10 CPUs ID of 0-9 and the corresponding memories ID of 0 by the CPUSET controller or other binding core system of CGroup (Control Groups). GPUs, network cards, and the like may be bound in a similar manner. In this way, the subtask has no perception, can run normally and only uses the resources such as the bound CPU, the memory and the like, and does not have any situation of accessing the memory across the computing units.

According to the cluster resource scheduling method provided by the embodiment of the disclosure, after the first target node meeting the number of resources required by task execution and the target computing unit on the first target node are determined, the task can be bound to the resources in the target computing unit on the first target node, so that the task is executed in the target computing unit, and when the resources are scheduled for the task of the job, the situation that the internal cross computing unit of the node accesses the memory is avoided, and the overall performance of the task execution can be improved.

Optionally, on the basis of the above embodiments, in some embodiments of the present disclosure, with reference to fig. 2, before determining the allocation resource result of the task in step S103, the method may further include the steps of:

Step S201: and determining preset performance parameters of each node, wherein the preset performance parameters are used for representing the capacity of the node for accessing the memory across the computing unit.

Specifically, the number of computing units on different nodes in the cluster may be different, and the performance of the nodes may be different, so that the capacity of each node to access memory across computing units is different. If the capacity of each node for accessing the memory across the computing unit is poor, the performance loss is large, the operation is hardly tolerated, and the overall performance of task execution of the operation is greatly affected.

In one embodiment, the RM may determine preset performance parameters of each node, that is, parameters that characterize the ability of the node to access memory across computing units, before allocating resources for the task of the job. The preset performance parameters may be pre-configured and stored on each node, and obtained by NM of each node and reported to RM, but are not limited thereto.

Step S202: and returning to the step of determining a target computing unit, of which the number of remaining resources is not smaller than the number of resources required by the task, from among the plurality of computing units based on the number of remaining resources of each computing unit of each node and the number of resources required by the task when each preset performance parameter is smaller than or equal to a preset performance parameter threshold.

It can be understood that when each preset performance parameter is less than or equal to the preset performance parameter threshold, that is, when the capacity of each node for accessing the memory across the computing unit is weak, the performance loss of the cluster system is large, which can have a serious influence on the task running of the job. At this time, returning to step S103, a step of determining a target computing unit whose number of remaining resources in the plurality of computing units is not smaller than the number of resources required for the task based on the number of remaining resources in the computing units of the nodes and the number of resources required for the task, and then continuing to perform steps S104 to S105.

In this way, by judging the performance parameter of the capability of the cluster node to access the memory across the computing units in advance, when the performance parameter is smaller, resource scheduling based on the scheme provided by the embodiment of the disclosure is started, after the first target node meeting the number of resources required by task execution and the target computing units on the first target node are determined, the task can be bound to the resources in the target computing units on the first target node, so that the task is executed in the target computing units, and the condition that the memory is accessed across the computing units in the node is avoided when the resources are scheduled for the task of the job, thereby improving the overall performance of the task execution. In addition, the flexibility of cluster resource scheduling is increased.

Optionally, in some embodiments of the present disclosure, the step of determining the preset performance parameter of each node in step S201 may specifically include the following sub-steps:

Step i): and receiving unit parameters of each computing unit on each node reported by each node, wherein the unit parameters comprise any one or more of the total number of the computing units on the node, the resource types in each computing unit and the total number of the resources.

Illustratively, in some embodiments of the present disclosure, the resource types in each computing unit may include, but are not limited to, any one or more of a CPU, memory, GPU, and network card.

Specifically, the NM of each node may report, through a heartbeat, the total number of computing units on each node, such as 3 computing units on node 1, the resource types in each computing unit, such as CPU and memory, and the total number of resources of each computing unit, such as the total number of available resources, as CPU 10 cores, 100GB of memory, and the like.

Step ii): and determining preset performance parameters corresponding to the nodes based on the unit parameters corresponding to the nodes, wherein the unit parameters and the preset performance parameters are in positive correlation.

Specifically, the RM may further determine the preset performance parameter corresponding to each node based on the received total number of computing units on each node, such as 3 computing units on node 1, the resource types in each computing unit, such as CPU and memory, and the total number of resources of each computing unit, such as the total number of available resources, which is CPU 10 cores, memory 100GB, and the like.

In one embodiment, for example, the more the total number of the computing units on each node is, the larger the preset performance parameter is, for example, a corresponding relation table between the total number of different computing units and the preset performance parameter is pre-established, and the preset performance parameter corresponding to the node is determined through table lookup. Or respectively calculating two preset performance parameters based on the total number of the calculation units, the resource types in each calculation unit and the total number of the resources, and then averaging to obtain the final preset performance parameters corresponding to the nodes.

Optionally, on the basis of the above embodiments, in some embodiments of the disclosure, in combination with the embodiment shown in fig. 3, the method may further include the following steps:

Step S301: and when each preset performance parameter is larger than the preset performance parameter threshold, determining a second target node with the number of the residual resources in each node not smaller than the number of the resources required by the task based on the number of the resources required by the task and the number of the residual resources of each node.

It can be understood that when each preset performance parameter is greater than the preset performance parameter threshold, that is, when each node has strong capacity of accessing the memory across the computing unit, the overall performance loss of the cluster system is greatly reduced, and the influence on the task operation of the job is light or even negligible. At this time, the resource allocation may be performed based on the existing primary resource scheduling allocation manner, specifically, the RM determines, based on the number of resources required by the subtasks and the number of remaining resources of each node, a second target node, such as node 2, where the number of remaining resources of the node is not less than the number of resources required by the subtasks.

Step S302: and obtaining a second node identifier of the second target node.

Specifically, the RM acquires a second node identifier, such as an ID, of the node 2, and sends the second node identifier, such as the ID, to the AM.

Step S303: based on the second node identification, the NM of the second target node is notified to initiate a corresponding resource container to perform the task.

Specifically, the AM sends a notification message to the NM of the second target node, e.g., node 2, based on the second node identification, e.g., ID, to instruct the NM to initiate a corresponding resource Container on node 2 to perform the subtask.

Therefore, by judging the performance parameter of the cluster node, which is strong and weak in the degree of accessing the memory across the computing unit, in advance, when the performance parameter is large, only the original resource scheduling allocation mode is adopted, so that the flexibility of cluster resource scheduling is improved.

Optionally, in some embodiments of the present disclosure, in combination with the one shown in fig. 4, the method may further include the steps of:

Step S401: when the target computing unit, which has not determined that the number of remaining resources to the plurality of computing units is not less than the number of resources required for the task, is not determined, a timer is started.

Step S402: and when the timing duration is longer than the preset duration, determining a target computing unit with the number of the residual resources in the computing units not smaller than the number of the resources required by the task based on the number of the residual resources in the computing units of each node at the current moment and the number of the resources required by the task.

For example, the preset time period may be set as needed, which is not limited. Specifically, when the RM does not determine the target computing units satisfying the number of resources required by the subtasks, the RM may wait for a certain period of time, and along with dynamic changes of the cluster system, the resources of the computing units on one or more nodes may be released, and at this time, the RM may determine, based on the number of remaining resources of each computing unit of each node at the current moment, that the number of remaining resources of the computing units is not less than the target computing units of the number of resources required by the subtasks. Therefore, when resources are scheduled for the task of the job, the situation that the internal cross-computing unit of the node accesses the memory is avoided, so that the overall performance of task execution can be improved, and meanwhile, the resources can be scheduled for the task of the job in time, so that the task of the job can be executed more quickly.

Optionally, in some embodiments of the disclosure, the method further comprises: when it is determined that there are a plurality of target computing units, any one of the target computing units is selected.

Specifically, in one embodiment, in step S103, when the RM determines, based on the number of remaining resources of each computing unit of each node, that the number of remaining resources of the computing unit is not less than the target computing unit of the number of resources required by the task, if it is determined that the number of remaining resources of a plurality of computing units is not less than the number of resources required by the task, that is, that a plurality of target computing units are present, any one computing unit of the plurality of target computing units is selected as the final target computing unit.

For example, if the RM determines that the number of remaining resources, such as the number of remaining CPUs and memories, of the 3 computing units (A, B, C) on the node 1 is not smaller than the number of resources, such as the number of CPUs and memories, required for the task, based on the number of remaining resources of each computing unit of each node (such as the node 1 to the node N), any one computing unit, such as the computing unit B, may be selected as the target computing unit among the 3 computing units (A, B, C). In some embodiments, the determined 3 computing units (A, B, C), for example, may also be located on different nodes, such as node 2 and node 3, which is only for illustration and not limitation in this embodiment.

For ease of understanding, the resource scheduling method provided by the embodiments of the present disclosure is described below in conjunction with a specific Hadoop yan example.

As shown in fig. 5, a data interaction diagram of resource scheduling under the system structure of the Hadoop yan is shown. The Hadoop yan includes an RM and a plurality of NMs. The RM comprises a scheduler for carrying out resource scheduling management, and each node where the NM is located adopts a NUMA architecture and is divided into a plurality of computing units sockets, and each Socket is provided with an independent CPU, an independent memory and other resources such as a GPU card, a network card and the like.

In the first step, NM of each node detects Socket related information (such as total number of sockets, socketID, number of remaining resources of each Socket, etc.) of own node and reports the Socket related information to RM.

For example, a node has 4 sockets in total, and the total resources corresponding to each Socket include a CPU 10 core, a memory 100GB, 2 GPU cards, 1 network card, and the like. After receiving Socket related information reported by NMs of all nodes, the RM records the Socket related information for subsequent scheduling and use.

And secondly, when the AM applies resources to the task of which the RM is the job, the RM schedules and allocates the resources, and idle nodes and idle sockets are found for the task.

Specifically, when the size of the remaining resources of a Socket can completely meet the size of the content resource required by the task, the task is allowed to be allocated to the Socket for execution. For example, the content required by the task is defined as (5 cores, 10 GB), one node residual resource is composed of two sockets, each Socket has only residual resources (3 cores, 6 GB), at this time, no Socket meets the resource task requirement, and at this time, allocation of resources may not be allowed.

When the residual resource size of one Socket can completely meet the content resource size required by the task, the RM returns SocketID of the Socket and the node ID of the node where the Socket is located to the AM correspondingly.

Third, the AM generates a task schedule according to SocketID and the node ID, and transmits the task schedule to the NM of the node indicated by the node ID, for example, the NM of the node N.

In the embodiment of the disclosure, the task scheduling information needs to be attached with an additional Socket ID (indicating a specific Socket) in addition to the node information such as the node ID.

Fourthly, when the NM receives the task scheduling information, the NM binds the task to a CPU, a memory, a GPU card and a network card in the Socket corresponding to the Socket ID according to the Socket ID and the instruction of the task scheduling information.

For example, socket id=0, which indicates that the task needs to be bound to Socket0 on node N, where the CPU ID corresponding to Socket0 is 0-9, and the memory ID is 0, and the task is bound to the 10 cores and the corresponding memories through CGroup CPUSET or other core binding systems. The GPU card and the network card are bound in a similar other way.

The tasks are distributed and bound to one Socket, so that the tasks run unaware, and resources such as a CPU (Central processing Unit), a memory, a GPU (graphics processing Unit) card and a network card of the Socket are only used, so that the situation of accessing the memory across the Socket is avoided, and the overall performance of the tasks of the job is improved when the tasks are executed.

The scheme of the embodiment greatly improves the performance of the machine with insufficient Socket crossing capacity, and has great benefits for almost all operations; and for machines with strong Socket crossing capability, the performance is improved to a certain extent, and the method has great benefits for performance-sensitive operation.

The embodiment of the disclosure also provides a cluster resource scheduling device, where each node in the cluster includes a plurality of computing units and adopts a non-uniform memory access architecture, as shown in fig. 6, the device may include:

A data receiving module 601, configured to receive the number of remaining resources and a unit identifier of each computing unit in the node reported by each node in the cluster;

A request receiving module 602, configured to receive a resource scheduling request of a target job, where the resource scheduling request includes a resource amount required by a task of the target job;

A unit determining module 603 configured to determine a target computing unit, of which the number of remaining resources in the plurality of computing units is not less than the number of resources required for the task, based on the number of remaining resources in each of the computing units of each of the nodes and the number of resources required for the task;

A resource allocation module 604, configured to obtain a first node identifier of a first target node where the target computing unit is located based on the unit identifier of the target computing unit, and determine a resource allocation result of the task;

And a task execution module 605, configured to send the resource allocation result to the first target node based on the first node identifier, so as to bind the task to the resource of the target computing unit, and then execute the task on the target computing unit.

According to the cluster resource scheduling device provided by the embodiment of the disclosure, after the first target node meeting the number of resources required by task execution and the target computing unit on the first target node are determined, the task can be bound to the resources in the target computing unit on the first target node, so that the task is executed in the target computing unit, and when the resources are scheduled for the task of the job, the situation that the internal cross computing unit of the node accesses the memory is avoided, and the overall performance of the task execution can be improved.

the unit determining module 603 is further configured to determine, when each of the preset performance parameters is less than or equal to a preset performance parameter threshold, a target computing unit of the plurality of computing units, based on a remaining resource amount of each computing unit of each of the nodes and a resource amount required by the task, where the remaining resource amount is not less than the resource amount required by the task.

Optionally, in some embodiments of the disclosure, the performance parameter determining module is specifically configured to: receiving unit parameters of the computing units on the nodes reported by the nodes, wherein the unit parameters comprise any one or more of the total number of the computing units on the nodes, the resource types in each computing unit and the total number of the resources; and determining preset performance parameters corresponding to the nodes based on the unit parameters corresponding to the nodes, wherein the unit parameters and the preset performance parameters are in positive correlation.

Optionally, in some embodiments of the present disclosure, the resource types in the computing unit may include, but are not limited to, any one or more of a CPU, a memory, a GPU, and a network card.

Optionally, in some embodiments of the present disclosure, the apparatus may further include a scheduling control module configured to determine, when each of the preset performance parameters is greater than the preset performance parameter threshold, a second target node that is not less than the number of resources required for the task based on the number of resources required for the task and the number of resources remaining for each of the nodes; acquiring a second node identifier of the second target node; based on the second node identification, a node manager of the second target node is notified to launch a corresponding resource container to perform the task.

Optionally, in some embodiments of the disclosure, the apparatus may further include a timing module for starting timing when it is not determined that the number of remaining resources to the plurality of computing units is not less than the target computing unit of the number of resources required for the task. The unit determining module 603 is further configured to determine, when the timing duration of the timing module is greater than a preset duration, a target computing unit, where the number of remaining resources in the computing units is not less than the number of resources required by the task, based on the number of remaining resources in the computing units of each node at the current time and the number of resources required by the task.

Optionally, in some embodiments of the present disclosure, the unit determining module 603 is further configured to: and when a plurality of target computing units are determined, selecting any one of the target computing units.

The embodiment of the present disclosure provides a computer device, as shown in fig. 7, including a processor 701 and a memory 702, where the memory 702 stores a degree of computers capable of being executed by the processor 701, and the processor 701 implements the cluster resource scheduling method provided by the embodiments of the present disclosure when executing the degree of computers.

By applying the scheme provided by the embodiment of the disclosure, after the first target node meeting the number of resources required by task execution and the target computing unit on the first target node are determined, the task can be bound to the resources in the target computing unit on the first target node, so that the task is executed in the target computing unit, and when the resources are scheduled for the task of the job, the situation that the internal cross computing unit of the node accesses the memory is avoided, and the overall performance of the task execution can be improved.

The Memory may include RAM (Random Access Memory ) or NVM (Non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a CPU, NP (Network Processor ), etc.; but may also be a DSP (DIGITAL SIGNAL Processor), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field-Programmable gate array) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

The memory 702 and the processor 701 may be in data transmission through a wired connection or a wireless connection, and the computer device and the other devices may communicate through a wired communication interface or a wireless communication interface. Fig. 7 shows only an example of data transmission through a bus, and is not limited to a specific connection method.

When a computer device is used as a cluster node, the processor 701 in the embodiment shown in fig. 7 may be a CPU as mentioned in the above embodiment of the method, or may be another independent processor. When the computer equipment is used as a cluster node, other functional components such as a GPU card, a network card and the like can be further included.

In addition, the embodiment of the disclosure also provides a computer readable storage medium, and the computer readable storage medium can store a computer program, and the computer program realizes the cluster resource scheduling method provided by the embodiments of the disclosure when being called and executed by a processor.

In yet another embodiment provided by the present disclosure, there is also provided a computer program product, which when run on a computer, causes the computer to perform the above-described cluster resource scheduling method provided by the embodiments of the present disclosure.

For the embodiments of the cluster resource scheduling apparatus, the computer device and the computer readable storage medium, the description is relatively simple, and the relevant matters refer to the part of the description of the method embodiments, since the related matters are basically similar to the method embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, DSL (Digital Subscriber Line, digital subscriber line)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD (DIGITAL VERSATILE DISC, digital versatile disk)), or a semiconductor medium (e.g., an SSD (Solid state disk) STATE DISK), or the like.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1.A method for scheduling cluster resources, wherein each node in the cluster includes a plurality of computing units and adopts a non-uniform memory access architecture, the method comprising:

Determining a target computing unit of which the number of remaining resources in the plurality of computing units is not smaller than the number of resources required by the task, based on the number of remaining resources in each computing unit of each node and the number of resources required by the task; when a plurality of target computing units are determined, selecting any one of the target computing units;

2. The method of claim 1, wherein prior to determining the resource allocation result for the task, the method further comprises:

3. The method of claim 2, wherein the step of determining the preset performance parameters for each node comprises:

4. A method according to claim 2 or 3, characterized in that the method further comprises:

acquiring a second node identifier of the second target node;

5. A method according to any one of claims 1 to 3, wherein the method further comprises:

6. A cluster resource scheduling apparatus, wherein each node in the cluster includes a plurality of computing units and employs a non-uniform memory access architecture, the apparatus comprising:

a unit determining module configured to determine a target computing unit, of a plurality of computing units, of which the number of remaining resources is not smaller than the number of resources required for the task, based on the number of remaining resources of each computing unit of each node and the number of resources required for the task; when a plurality of target computing units are determined, selecting any one of the target computing units;

7. The apparatus of claim 6, wherein the apparatus further comprises:

8. A computer device comprising a processor and a memory;

Wherein the memory stores a computer program executable by the processor;

the clustered resource scheduling method of any one of claims 1-5 when executed by the processor.

9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when invoked and executed by a processor, implements the cluster resource scheduling method of any one of claims 1-5.