[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115220908A - Resource scheduling method, device, electronic equipment and storage medium - Google Patents

Resource scheduling method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115220908A
CN115220908A CN202210510406.9A CN202210510406A CN115220908A CN 115220908 A CN115220908 A CN 115220908A CN 202210510406 A CN202210510406 A CN 202210510406A CN 115220908 A CN115220908 A CN 115220908A
Authority
CN
China
Prior art keywords
resource pool
resource
hardware
idle
hardware processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210510406.9A
Other languages
Chinese (zh)
Inventor
陈友宣
李文丰
何景峰
冯韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Automobile Group Co Ltd
Original Assignee
Guangzhou Automobile Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Automobile Group Co Ltd filed Critical Guangzhou Automobile Group Co Ltd
Priority to CN202210510406.9A priority Critical patent/CN115220908A/en
Publication of CN115220908A publication Critical patent/CN115220908A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The application provides a resource scheduling method, a resource scheduling device, an electronic device and a storage medium, wherein the method comprises the following steps: aiming at a target task which is calculated by a first hardware resource platform in a first resource pool, acquiring the number of target cores required by the target task and acquiring the number of first cores of idle hardware processing nodes of the first hardware resource platform; if the target core number is larger than the first core number, acquiring a second core number of idle hardware processing nodes of a second hardware resource platform in a second resource pool; and if the sum of the first core number and the second core number is detected to be larger than or equal to the target core number, temporarily integrating the idle hardware processing nodes of the second hardware resource platform to the first resource pool, pushing the target task to the first resource pool, and instructing the first hardware resource platform to call the integrated idle hardware processing nodes of the first resource pool to calculate the target task. The method and the device for resource pool resource allocation can improve the whole resource utilization rate and the whole calculation efficiency of the resource pool.

Description

Resource scheduling method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of resource management, and in particular, to a resource scheduling method and apparatus, an electronic device, and a storage medium.
Background
In many fields, all tasks related to computation need to be allocated with certain hardware resources to provide necessary support. The hardware resource is usually a hardware core of a hardware resource platform in a resource pool, for example: the resource pool allocates a CPU core of the CPU platform therein as a hardware resource to the task to support the computation of the task. In the prior art, during the process of scheduling and allocating the hardware resources in the resource pool, part of the hardware resources are often wasted, so that the resource utilization rate and the calculation efficiency of the resource pool are low.
Disclosure of Invention
An object of the present application is to provide a resource scheduling method, an apparatus, an electronic device, and a storage medium, which can improve the overall resource utilization and the overall computing efficiency of a resource pool.
According to an aspect of the embodiments of the present application, a method for scheduling resources is disclosed, the method comprising:
aiming at a target task which is calculated by a first hardware resource platform in a first resource pool, acquiring the number of target cores required by the target task, and acquiring the number of first cores of idle hardware processing nodes of the first hardware resource platform;
if the target core number is larger than the first core number, acquiring a second core number of idle hardware processing nodes of a second hardware resource platform in a second resource pool;
if the sum of the first core number and the second core number is detected to be larger than or equal to the target core number, temporarily integrating the idle hardware processing nodes of the second hardware resource platform to the first resource pool, pushing the target task to the first resource pool, and instructing the first hardware resource platform to call the integrated idle hardware processing nodes of the first resource pool to calculate the target task, wherein the core number of the integrated idle hardware processing nodes is larger than or equal to the target core number.
According to an aspect of the embodiments of the present application, a resource scheduling apparatus is disclosed, the apparatus includes:
the first acquisition module is configured to acquire the number of target cores required by a target task and acquire the number of first cores of idle hardware processing nodes of a first hardware resource platform aiming at the target task calculated by the first hardware resource platform in a first resource pool;
a second obtaining module, configured to obtain, if it is detected that the number of target cores is greater than the first number of cores, a second number of cores of idle hardware processing nodes of a second hardware resource platform in a second resource pool;
and the integration pushing module is configured to temporarily integrate the idle hardware processing nodes of the second hardware resource platform into the first resource pool and push the target task to the first resource pool if the sum of the first core number and the second core number is detected to be greater than or equal to the target core number, and instruct the first hardware resource platform to call the integrated idle hardware processing nodes of the first resource pool to calculate the target task, wherein the core number of the integrated idle hardware processing nodes is greater than or equal to the target core number.
In an exemplary embodiment of the present application, the apparatus is configured to:
taking out the target task from a task queue, and acquiring the operation information of the target task;
and retrieving to obtain a resource pool of the hardware environment adaptive to the target task based on the operation information, and taking the resource pool adaptive to the target task as the first resource pool.
In an exemplary embodiment of the present application, the apparatus is configured to:
determining a software certificate required by the target task;
if the software resource platform is detected to have the idle software certificate, and the sum of the first core number and the second core number is detected to be larger than or equal to the target core number, the software certificate is allocated to the target task, the first resource pool is temporarily integrated, the target task is pushed to the first resource pool, and the first hardware resource platform is instructed to call the idle hardware processing node after integration to calculate the target task.
In an exemplary embodiment of the present application, the apparatus is configured to:
and after the target task is calculated, releasing the software certificate in the software resource platform, and releasing the hardware processing node occupied by the target task in the first resource pool.
In an exemplary embodiment of the present application, the apparatus is configured to:
and after the target task is calculated, releasing the hardware processing nodes occupied by the target task in the first resource pool, and returning the idle hardware processing nodes temporarily integrated to the first resource pool to the second resource pool.
In an exemplary embodiment of the present application, the apparatus is configured to:
temporarily integrating the idle hardware processing nodes of the second hardware resource platform to the first resource pool by temporarily changing the labels of the idle hardware processing nodes of the second hardware resource platform to be bound with the first resource pool;
and returning the idle hardware processing nodes temporarily integrated to the first resource pool to the second resource pool in a mode of restoring the labels of the idle hardware processing nodes temporarily integrated to the first resource pool into binding with the second resource pool.
In an exemplary embodiment of the present application, the apparatus is configured to:
screening out idle hardware processing nodes which enable the number of cores of the idle hardware nodes after integration to be larger than or equal to the target number of cores and enable the number of cores of the idle hardware nodes after integration to be minimum from the second hardware resource platform;
and temporarily integrating the idle hardware processing nodes screened from the second hardware resource platform into the first resource pool.
According to an aspect of an embodiment of the present application, an electronic device is disclosed, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the electronic device to implement the methods provided in the various alternative implementations described above.
According to an aspect of embodiments of the present application, a computer program medium is disclosed, on which computer readable instructions are stored, which, when executed by a processor of a computer, cause the computer to perform the method provided in the above various alternative implementations.
According to an aspect of an embodiment of the present application, there is provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above.
In the embodiment of the application, for a target task which needs to be calculated by a first hardware resource platform in a first resource pool, when the number of target cores required by the target task is greater than the first number of cores of idle hardware processing nodes of the first hardware resource platform, the idle hardware processing nodes of the first hardware resource platform in a second resource pool are temporarily integrated into the first resource pool, so that the hardware resource distribution of the resource pool is adaptively and elastically adjusted, the idle hardware processing nodes can be used for calculating the target task after the integration of the first resource pool, secondary scheduling of the idle hardware processing nodes is realized, the waste of hardware resources is reduced, and the overall resource utilization rate and the overall calculation efficiency of the first resource pool and the second resource pool are improved.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
Fig. 1 shows a basic resource scheduling policy according to an embodiment of the present application.
Fig. 2 is a diagram illustrating distribution of hardware resources before and after resource scheduling according to an embodiment of the present application in the embodiment of fig. 1.
Fig. 3 shows a flow chart of a resource scheduling method according to an embodiment of the present application.
Fig. 4 shows a flowchart of a resource scheduling method provided by the present application coupled with a basic resource scheduling policy according to an embodiment of the present application.
Fig. 5 shows a detailed flow diagram of resource scheduling according to an embodiment of the present application.
FIG. 6 illustrates a resource schedule development logic architecture in accordance with one embodiment of the present application.
Fig. 7 shows a block diagram of a resource scheduling apparatus according to an embodiment of the present application.
FIG. 8 illustrates an electronic device hardware diagram according to one embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments of the present application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, steps, and so forth. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The application provides a resource scheduling method which can be used for a High Performance Computing (HPC) resource scheduling System (PBS). As a computing task manager for resource scheduling in HPC, PBS can schedule the computing resources of HPC by the resource scheduling method provided by the application, thereby improving the overall resource utilization rate and the overall computing efficiency of HPC.
Fig. 1 shows a basic resource scheduling policy diagram according to an embodiment of the present application. Fig. 2 is a schematic diagram illustrating distribution of hardware resources before and after resource scheduling in the embodiment of fig. 1 according to an embodiment of the present application.
Referring to fig. 1 to fig. 2, in this embodiment, after receiving a task Q to be calculated, a resource scheduling system PBS determines relevant information of the task Q: the requirements 112 core, specify the hardware resource platform P1.
And determining a resource pool R1 of the hardware adaptation task Q according to the software information of the task Q. And then detecting whether the idle hardware processing nodes of the hardware resource platform P1 in the resource pool R1 have enough core number, and detecting whether the software resource platform has a license certificate of the task Q in an idle state.
And if the software resource platform is idle and has the certificate and the idle hardware processing node of the P1 has enough core number, copying the task Q from the storage to the R1 and delivering the task Q to the idle hardware processing node of the P1 for calculation.
If the software resource platform has a certificate in the idle state, but the idle hardware processing node of the P1 does not have enough core number, temporarily integrating the idle hardware processing node of the hardware resource platform P2 in the resource pool R2 to the R1, so that the core number of the idle hardware processing node after the integration of the P1 is greater than or equal to 112 cores. And further copying the task Q from the storage to the R1, and delivering the task Q to the idle hardware processing node after the integration of the P1 for calculation.
Therefore, in this embodiment, the idle hardware processing nodes of P2 in R2 are temporarily integrated into the resource pool R1, so that the idle hardware processing nodes of P1 in R1 that would otherwise be set aside and wasted are scheduled for the second time, thereby improving the overall resource utilization rate and the overall computing efficiency of R1 and R2.
In this embodiment, preferably, the platform type of the hardware resource platform P1 is the same as or similar to the platform type of the hardware resource platform P2, but since the hardware environments of the resource pools are different, the tasks specifically applicable to the hardware resource platform P1 and the hardware resource platform P2 are usually different.
Fig. 3 is a flowchart illustrating a resource scheduling method according to an embodiment of the present application, where an exemplary implementation subject of the method is a resource scheduling system PBS, and the method includes:
step S110, aiming at a target task which is calculated by a first hardware resource platform in a first resource pool, acquiring the number of target cores required by the target task, and acquiring the number of first cores of idle hardware processing nodes of the first hardware resource platform;
step S120, if the number of the target cores is detected to be larger than the number of the first cores, acquiring the number of the second cores of the idle hardware processing nodes of the second hardware resource platform in the second resource pool;
step S130, if the sum of the first core number and the second core number is detected to be larger than or equal to the target core number, temporarily integrating the idle hardware processing nodes in the second hardware resource platform to the first resource pool, pushing the target task to the first resource pool, and instructing the first hardware resource platform to call the integrated idle hardware processing nodes in the first resource pool to calculate the target task, wherein the core number of the integrated idle hardware processing nodes is larger than or equal to the target core number.
Specifically, in the embodiment of the present application, the number n1 of the target cores required by the target task Q, which is calculated by requiring the first hardware resource platform P1 in the first resource pool R1, is obtained. The number of target cores is n1, which indicates that n1 cores need to be occupied for calculating the target task Q.
The core is used as a hardware resource occupying unit in the hardware resource platform and is positioned in the hardware resource platform as a hardware processing node of the hardware resource scheduling unit. Cores include, but are not limited to, CPU cores, GPU cores, and the like. When the hardware resource platform is a CPU platform built through CPU nodes, the hardware processing nodes are the CPU nodes, and the core is the CPU core. When the hardware resource platform is a GPU platform built through GPU nodes, the hardware processing nodes are GPU nodes, and the core of the hardware processing nodes is a GPU core.
Considering that the number N1 of the first cores of the idle hardware processing node of P1 may be smaller than N1, especially when the target task Q is a large-scale task requiring a large number of cores, this situation may cause that R1 is temporarily unavailable for calculating the target task Q, thereby causing a waste of hardware resources of R1.
In order to improve the utilization efficiency of the hardware resources of the resource pool, in this embodiment of the application, if it is detected that N1 is greater than N1, the second core number N2 of the idle hardware processing nodes of the second hardware resource platform P2 in the second resource pool R2 is obtained. Preferably, P2 is a platform of the same type as P1, i.e., P1 has a platform type that is the same as or similar to the platform type of P2.
It is further checked whether the sum of N1 and N2 is greater than or equal to N1. If the sum of N1 and N2 is detected to be larger than or equal to N1, temporarily integrating the idle hardware processing nodes in P2 to R1, so that the core number N3 of the idle hardware processing nodes after integration, which can be called by P1 in R1, is larger than or equal to N1. And further pushing the target task Q to the R1, and instructing the P1 to call the integrated idle hardware processing node of the R1 to calculate the target task Q.
For example: the number n1 of target cores required by the target task Q is 112; the number N1 of cores of the idle hardware processing node C1 (C11, C12,.., C1 x) of the first hardware resource platform P1 in the first resource pool R1 is 80; the number N2 of cores of the idle hardware processing node C2 (C21, C22., C2 y) of the second hardware resource platform P2 in the second resource pool R2 is 40. Wherein x and y are both positive integers.
Since N1 is greater than N1, the requirement of the target task Q cannot be satisfied by only relying on the hardware resources of R1. However, since the sum of N1 and N2 is 120, which is greater than N1, after the idle hardware processing node of P2 is temporarily integrated into R1, the number N3 of cores of the integrated idle hardware processing node available for P1 to call in R1 may be greater than or equal to N1. Therefore, after the idle hardware processing node of the P2 is temporarily integrated to the R1, and N3 is greater than or equal to N1, the target task Q is pushed to the R1, so that the P1 calls the integrated idle hardware processing node of the R1 to calculate the target task Q.
Therefore, in the embodiment of the application, for a target task which needs to be calculated by a first hardware resource platform in a first resource pool, when the number of target cores required by the target task is greater than the first core number of idle hardware processing nodes of the first hardware resource platform, the idle hardware processing nodes of the first hardware resource platform in a second resource pool are temporarily integrated to the first resource pool, so that the hardware resource distribution of the resource pool is adaptively and elastically adjusted, the idle hardware processing nodes after the integration of the first resource pool can be used for calculating the target task, the secondary scheduling of the idle hardware processing nodes is realized, the waste of hardware resources is reduced, and the overall resource utilization rate and the overall calculation efficiency of the first resource pool and the second resource pool are improved.
In one embodiment, the target task is taken out from the task queue, and the job information of the target task is obtained. And based on the operation information, retrieving to obtain a resource pool of the hardware environment adaptive to the target task, and taking the resource pool adaptive to the target task as a first resource pool.
The embodiment is mainly applied to a resource scheduling scenario of a FIFO (First In First Out) strategy. In this embodiment, the tasks submitted to the PBS are queued in time order to form a task queue. And the PBS takes the current task taken out of the task queue as a target task Q and acquires the job information of the target task Q. The job information includes a task ID, a software name S1, a required first hardware resource platform P1, a required target core n1, and the like.
And the PBS searches a pre-configured database for a resource pool of which the hardware environment is matched with the target task Q based on the operation information, and uses the resource pool as a first resource pool R1.
In one embodiment, a software certificate required by the target task is determined. And if detecting that the software resource platform has idle software certificates and the sum of the first core number and the second core number is greater than or equal to the target core number, allocating the software certificates to the target tasks, temporarily integrating the first resource pool and pushing the target tasks to the first resource pool, and instructing the first hardware resource platform to call the integrated idle hardware processing nodes to calculate the target tasks.
In this embodiment, the PBS determines the number n1 of target cores required by the target task Q, and also determines the software certificate L1 required by the target task Q. The determined n1 is used for matching hardware resources, and the determined software certificate L1 is used for matching software resources. And when the software and hardware resources are successfully matched, performing software resource allocation and hardware resource allocation on the target task Q.
Specifically, the PBS may first determine the software certificate L1 required by the target task Q, and detect whether the sum of the first core number N1 and the second core number N2 is greater than or equal to N1 after detecting that the software resource platform has the idle software certificate L1. And if the sum of the N1 and the N2 is greater than or equal to N1, distributing the software certificate L1 to the target task Q, and temporarily integrating the idle hardware processing nodes of the second resource platform P2 into the first resource pool R1, so that the core number N3 of the integrated idle hardware processing nodes which can be called by the first hardware resource platform P1 in the R1 is greater than or equal to N1.
The PBS may also detect whether the sum of N1 and N2 is greater than or equal to N1 while detecting whether the software resource platform has the idle software certificate L1. And further, when detecting that the software resource platform has an idle software certificate L2 and the sum of N1 and N2 is greater than or equal to N1, allocating the software certificate L1 to the target task Q, and temporarily integrating the idle hardware processing node of P2 to R1, so that N3 is greater than or equal to N1.
In an embodiment, after the target task is calculated, the software certificate in the software resource platform is released, and the hardware processing node occupied by the target task in the first resource pool is released.
In this embodiment, the PBS allocates the software certificate L1 to the target task Q, and temporarily integrates the idle hardware processing node of the second hardware resource platform P2 in the second resource pool R2 into the first resource pool R1, and then the target task Q is calculated by the first hardware resource platform P1 calling the integrated idle hardware processing node of R1.
Because the occupied software and hardware resources can not be used for calculating other tasks temporarily, in order to ensure the efficient utilization of the software and hardware resources, after the calculation of the target task Q is completed, the PBS releases the software and hardware resources occupied by the target task Q. That is, the software certificate L1 is released, and the hardware processing node occupied by the target task Q in P1 is released. And the released software and hardware resources are restored to be available idle states.
Fig. 4 is a flowchart illustrating a resource scheduling method provided by the present application coupled with a basic resource scheduling policy according to an embodiment of the present application.
Refer to fig. 4. In this embodiment, after receiving a task Q to be calculated, submitted by a user, the PBS determines job information of the task Q: the software name is S1, the designated CPU platform is P1, and the required target core number is n1.
The PBS searches in a software resource platform and detects a certificate L1 of the software S1, namely a software certificate L1 of the task Q1; and searching in the hardware resource pool, and detecting the first core number N1 of the idle CPU nodes of P1 in the resource pool R1 and the second core number N2 of the idle CPU nodes of P2 in the resource pool R2.
And if the idle software certificate L1 exists and N1 is greater than or equal to N1, pushing the task Q to R1, calling a task execution script and executing a computing process.
If no idle software certificate L1 exists or N1 is smaller than N1, queuing the task Q, then according to the resource scheduling method provided by the application, when detecting that an idle software certificate L1 exists and the sum of N1 and N2 is larger than or equal to N1, integrating the task Q and a resource pool by using a dynamic self-adaptive FIFO scheduling strategy, temporarily integrating the idle CPU node of P2 to R1, enabling the core number of the idle CPU node after the integration of R1 to be larger than or equal to N1, further pushing the task Q to R1, starting a task execution script, and executing a calculation process.
And after the computing process is finished, releasing the CPU nodes of the software certificates L1 and P1 occupied by the task Q.
In an embodiment, after the target task is calculated, the hardware processing nodes occupied by the target task in the first resource pool are released, and the idle hardware processing nodes temporarily integrated into the first resource pool are returned to the second resource pool.
In this embodiment, in the calculation process of the target task Q, the hardware processing node occupied by the target task Q includes: the hardware processing node of the first hardware resource platform P1 originally belongs to the second hardware resource platform P2 but is temporarily integrated to the hardware processing node of R1.
In order to maintain the stability of the affiliation relationship between the hardware processing nodes and the resource pool, after the calculation of the target task Q is completed, the hardware processing nodes occupied by the target task Q are released and restored to an idle state, and the hardware processing nodes which originally belong to P2 and are temporarily integrated to R1 are restored to R2.
In an embodiment, the idle hardware processing nodes of the second hardware resource platform are temporarily integrated into the first resource pool by temporarily changing the labels of the idle hardware processing nodes in the second hardware resource platform into the binding with the first resource pool. And returning the idle hardware processing nodes temporarily integrated to the first resource pool to the second resource pool in a mode of restoring the labels of the idle hardware processing nodes temporarily integrated to the first resource pool into binding with the second resource pool.
In this embodiment, the PBS manages the affiliation between the hardware processing node and the resource pool in a label binding manner.
Specifically, the label of the hardware processing node in the first resource pool R1 is bound to R1, and the label of the hardware processing node in the second resource pool R2 is bound to R2.
After detecting that the sum of the first core number N1 and the second core number N2 is greater than or equal to the target core number N1, the PBS temporarily changes the label of the idle hardware processing node of the second hardware resource platform P2 to be bound with R1, thereby temporarily integrating the idle hardware processing node of P2 to R1. Similarly, after the calculation of the target task Q is completed, the PBS restores the label of the idle hardware processing node temporarily integrated to R1 to be bound with R2, so that the idle hardware processing node temporarily integrated to R1 is returned to R2.
Fig. 5 shows a detailed flowchart of resource scheduling according to an embodiment of the present application.
Refer to fig. 5. In this embodiment, the PBS confirms the job information of task Q: the software name is S1, the designated CPU platform is P1, and the required target core number is n1. And confirming the resource pool R state: x CPU platforms P1 in the resource pool R1, y CPU platforms P2 in the resource pool R2, a first core number N1 of idle CPU nodes of P1, and a second core number N2 of idle CPU nodes of P2. Wherein x and y are both positive integers.
If N1 is larger than N1, temporarily integrating the idle CPU node of P2 into R1, and expanding the number of cores of the temporarily available CPU node in R1 to (N1 + N2).
And if the existence of the idle software certificate L1 is detected, and (N1 + N2) is greater than or equal to N1, pushing the task Q to R1, calling a task execution script, and executing a computing process.
And after the calculation process is finished, releasing the software certificate L1, unbinding the idle CPU node which is temporarily integrated in the R1 and originally belongs to the P2, and returning to the R2.
FIG. 6 is a schematic diagram illustrating a resource scheduling development logic architecture according to an embodiment of the present application.
Refer to fig. 6. In this embodiment, at the first stage of the resource scheduling development logic, three associated databases are established: resource Pool database Resource _ Pool, resource Pool operation information database Resource _ operation, and software license information database Job _ lic _ data.
Specifically, by analyzing characteristics of various simulation software, the optimal CPU platform is matched, and Resource _ Pool of the Resource Pool database is established. The key fields include: resource pool name, number of nodes, number of cores, applicable software, etc.
And establishing a Resource pool operation information database Resource _ operation, and recording a Resource pool operation log. The key fields include: task ID, required target core number, software name, node number, name before resource pool operation, name after resource pool operation, scheduling system modification resource pool command and the like.
And establishing a software license information database Job _ lic _ data, and monitoring the state of the software license. The key fields include: the number of records, time, week, software name, license total number, license usage number, number of tasks in computation, number of tasks in goal, total number of tasks, number of cores in computation, number of cores in queue, etc.
And in the second stage of resource scheduling development logic, establishing a resource pool free resource secondary scheduling judgment logic through an autonomously developed shell + python automatic script.
Specifically, the job information of the target task is acquired through the command qstat of the PBS: task ID, software name, required platform name P1, required target core number n1.
Furthermore, the Job information of the target task is used for searching the software license information database Job _ lic _ data and checking the software license.
And when the license does not meet the requirement of the target task, continuing queuing. And when the license meets the requirement of the target task, retrieving Resource _ Pool of the Resource Pool database by using the operation information of the target task, and acquiring adaptive Resource Pool information R1. The idle CPU node state of R1 and the idle CPU node state of R2 are retrieved. If the sum of the first core number N1 of the idle CPU node of the R1 and the second core number N2 of the idle CPU node of the R2 is less than the target core N1, continuing to queue; and if the sum of the N1 and the N2 is greater than or equal to N1, entering a third stage of resource scheduling development logic.
And in the third stage of resource scheduling development logic, integrating the mapping relation of the idle resources of the resource pool after secondary scheduling through the independently developed shell + python automatic script.
Specifically, the label of the idle CPU node of P2 in R2 is modified to be bound with P1 of R1, and the record is operated in the Resource pool Operation information database Resource _ Operation in a relevant manner. And calling a PBS command qrun through a script, and pushing the target task to R1 for calculation.
And in the fourth stage of resource scheduling development logic, restoring the mapping relation of the resource pool through the independently developed shell + python automatic script.
Specifically, after the target task is calculated, the calculation resources are released. And according to the task ID, retrieving a Resource pool Operation information database Resource _ Operation, searching label information before adapting to the Resource pool modification, and restoring the label.
In an embodiment, the idle hardware processing node which enables the number of cores of the idle hardware nodes after integration to be larger than or equal to the target number of cores and enables the number of cores of the idle hardware nodes after integration to be minimum is screened from the second hardware resource platform. And temporarily integrating the idle hardware processing nodes screened from the second hardware resource platform into the first resource pool.
In this embodiment, the PBS minimally integrates the hardware resource distribution of the resource pool when flexibly adjusting the hardware resource distribution.
Specifically, considering that there are a plurality of idle hardware processing nodes of the second hardware resource platform P2, even if the sum of the first core number N1 and the second core number N2 is greater than or equal to the target core number N1, it does not necessarily mean that all the idle hardware processing nodes of P2 are integrated into the first resource pool R1. It would be helpful to maintain resource pool stability if the hardware processing nodes involved in the integration could be minimally controlled.
Therefore, after the PBS detects that the sum of N1 and N2 is greater than or equal to N1, the free hardware processing node that makes the core number N3 of the free hardware node after integration greater than or equal to N2 is screened from P2, and the value of N3 is minimized. And then temporarily integrating the idle hardware processing nodes screened from the P2 into the first resource pool R1.
For example: the number n1 of target cores required by the target task Q is 100; the core number N1 of the idle hardware processing node C1 of the first hardware resource platform P1 in the first resource pool R1 is 80; the number N2 of cores of the idle hardware processing node C2 (C21, C22, C23) of the second hardware resource platform P2 in the second resource pool R2 is 60. The number N21 of C21 cores is 10, the number N22 of C22 cores is 20, and the number N23 of C23 cores is 30.
C21, C22 and C23 can be all temporarily integrated to R1, so that the number N3 of cores of the idle hardware processing nodes after the integration of the R1 for P1 calling is 140 and is greater than N1, and the requirement of a target task Q is met; c21 and C22 can also be only temporarily integrated into R1, so that N3 is 110 and is greater than N1, and the requirement of the target task Q is met; it is also possible to only temporarily integrate C22 into R1 such that N3 is 100, equal to N1, to meet the requirements of target task Q.
In all the modes including the three integration modes, the third integration mode is set such that N3 is greater than or equal to N1 and the value of N3 is the smallest. A third integration approach can minimally control temporal integration involving only C22. PBS therefore preferably only temporarily integrates C22 to R1.
Fig. 7 shows a block diagram of a resource scheduling apparatus according to an embodiment of the present application, the apparatus including:
a first obtaining module 210, configured to obtain, for a target task that is calculated by a first hardware resource platform in a first resource pool, a target core number required by the target task, and obtain a first core number of an idle hardware processing node of the first hardware resource platform;
a second obtaining module 220, configured to obtain, if it is detected that the number of target cores is greater than the first number of cores, a second number of cores of idle hardware processing nodes of a second hardware resource platform in a second resource pool;
an integrated pushing module 230, configured to temporarily integrate the idle hardware processing nodes of the second hardware resource platform into the first resource pool if it is detected that the sum of the first core number and the second core number is greater than or equal to the target core number, and push the target task to the first resource pool, instruct the first hardware resource platform to call the integrated idle hardware processing nodes of the first resource pool to calculate the target task, where the core number of the integrated idle hardware processing nodes is greater than or equal to the target core number.
In an exemplary embodiment of the present application, the apparatus is configured to:
taking out the target task from a task queue, and acquiring the job information of the target task;
and retrieving to obtain a resource pool of the hardware environment adaptive to the target task based on the operation information, and taking the resource pool adaptive to the target task as the first resource pool.
In an exemplary embodiment of the present application, the apparatus is configured to:
determining a software certificate required by the target task;
if the software resource platform is detected to have the idle software certificate, and the sum of the first core number and the second core number is detected to be larger than or equal to the target core number, the software certificate is allocated to the target task, the first resource pool is temporarily integrated, the target task is pushed to the first resource pool, and the first hardware resource platform is instructed to call the idle hardware processing node after integration to calculate the target task.
In an exemplary embodiment of the present application, the apparatus is configured to:
and after the target task is calculated, releasing the software certificate in the software resource platform, and releasing the hardware processing node occupied by the target task in the first resource pool.
In an exemplary embodiment of the present application, the apparatus is configured to:
and after the target task is calculated, releasing the hardware processing nodes occupied by the target task in the first resource pool, and returning the idle hardware processing nodes temporarily integrated to the first resource pool to the second resource pool.
In an exemplary embodiment of the present application, the apparatus is configured to:
temporarily integrating the idle hardware processing nodes of the second hardware resource platform to the first resource pool by temporarily changing the labels of the idle hardware processing nodes of the second hardware resource platform to be bound with the first resource pool;
returning the idle hardware processing node temporarily integrated to the first resource pool to the second resource pool by restoring the label of the idle hardware processing node temporarily integrated to the first resource pool to be bound with the second resource pool.
In an exemplary embodiment of the present application, the apparatus is configured to:
screening out idle hardware processing nodes which enable the number of cores of the idle hardware nodes after integration to be larger than or equal to the target number of cores and enable the number of cores of the idle hardware nodes after integration to be minimum from the second hardware resource platform;
and temporarily integrating the idle hardware processing nodes screened from the second hardware resource platform into the first resource pool.
An electronic device 30 according to an embodiment of the present application is described below with reference to fig. 8. The electronic device 30 shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 8, the electronic device 30 is in the form of a general purpose computing device. The components of the electronic device 30 may include, but are not limited to: the at least one processing unit 310, the at least one memory unit 320, and a bus 330 that couples various system components including the memory unit 320 and the processing unit 310.
Wherein the storage unit stores program code executable by the processing unit 310 to cause the processing unit 310 to perform steps according to various exemplary embodiments of the present invention described in the description part of the above exemplary methods of the present specification. For example, the processing unit 310 may perform the various steps as shown in fig. 3.
The storage unit 320 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM) 3201 and/or a cache memory unit 3202, and may further include a read only memory unit (ROM) 3203.
The storage unit 320 may also include a program/utility 3204 having a set (at least one) of program modules 3205, such program modules 3205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 330 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 30 may also communicate with one or more external devices 400 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 30, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 30 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 350. An input/output (I/O) interface 350 is connected to the display unit 340. Also, the electronic device 30 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 360. As shown, the network adapter 360 communicates with the other modules of the electronic device 30 via the bus 330. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 30, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, and may also be implemented by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present application.
In an exemplary embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method described in the above method embodiment section.
According to an embodiment of the present application, there is also provided a program product for implementing the method in the above method embodiment, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as JAVA, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods herein are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims (10)

1. A method for scheduling resources, the method comprising:
aiming at a target task which is calculated by a first hardware resource platform in a first resource pool, acquiring the number of target cores required by the target task, and acquiring the number of first cores of idle hardware processing nodes of the first hardware resource platform;
if the target core number is larger than the first core number, acquiring a second core number of idle hardware processing nodes of a second hardware resource platform in a second resource pool;
if the sum of the first core number and the second core number is detected to be larger than or equal to the target core number, temporarily integrating the idle hardware processing nodes of the second hardware resource platform to the first resource pool, pushing the target task to the first resource pool, and instructing the first hardware resource platform to call the integrated idle hardware processing nodes of the first resource pool to calculate the target task, wherein the core number of the integrated idle hardware processing nodes is larger than or equal to the target core number.
2. The method of claim 1, further comprising:
taking out the target task from a task queue, and acquiring the operation information of the target task;
and retrieving to obtain a resource pool of the hardware environment adaptive to the target task based on the operation information, and taking the resource pool adaptive to the target task as the first resource pool.
3. The method of claim 1, further comprising:
determining a software certificate required by the target task;
if the software resource platform is detected to have the idle software certificate, and the sum of the first core number and the second core number is detected to be larger than or equal to the target core number, the software certificate is allocated to the target task, the first resource pool is temporarily integrated, the target task is pushed to the first resource pool, and the first hardware resource platform is instructed to call the idle hardware processing node after integration to calculate the target task.
4. The method of claim 3, further comprising:
and after the target task is calculated, releasing the software certificate in the software resource platform, and releasing the hardware processing node occupied by the target task in the first resource pool.
5. The method of claim 1, further comprising:
and after the target task is calculated, releasing the hardware processing nodes occupied by the target task in the first resource pool, and returning the idle hardware processing nodes temporarily integrated to the first resource pool to the second resource pool.
6. The method of claim 5, wherein temporarily integrating the idle hardware processing node of the second hardware resource platform into the first resource pool comprises: temporarily integrating idle hardware processing nodes of the second hardware resource platform into the first resource pool by temporarily changing the labels of the idle hardware processing nodes of the second hardware resource platform into binding with the first resource pool;
returning a spare hardware processing node temporarily integrated into the first resource pool to the second resource pool, comprising: returning the idle hardware processing node temporarily integrated to the first resource pool to the second resource pool by restoring the label of the idle hardware processing node temporarily integrated to the first resource pool to be bound with the second resource pool.
7. The method of claim 1, wherein temporarily integrating idle hardware processing nodes in the second hardware resource platform into the first resource pool comprises:
screening out idle hardware processing nodes which enable the number of cores of the idle hardware nodes after integration to be larger than or equal to the target number of cores and enable the number of cores of the idle hardware nodes after integration to be minimum from the second hardware resource platform;
and temporarily integrating the idle hardware processing nodes screened from the second hardware resource platform into the first resource pool.
8. An apparatus for scheduling resources, the apparatus comprising:
the first acquisition module is configured to acquire the number of target cores required by a target task and acquire the number of first cores of idle hardware processing nodes of a first hardware resource platform aiming at the target task calculated by the first hardware resource platform in a first resource pool;
a second obtaining module, configured to obtain, if it is detected that the number of target cores is greater than the first number of cores, a second number of cores of idle hardware processing nodes of a second hardware resource platform in a second resource pool;
and the integration pushing module is configured to temporarily integrate the idle hardware processing nodes of the second hardware resource platform into the first resource pool and push the target task to the first resource pool if the sum of the first core number and the second core number is detected to be greater than or equal to the target core number, and instruct the first hardware resource platform to call the integrated idle hardware processing nodes of the first resource pool to calculate the target task, wherein the core number of the integrated idle hardware processing nodes is greater than or equal to the target core number.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to carry out the method of any one of claims 1 to 7.
10. A computer-readable storage medium having computer-readable instructions stored thereon which, when executed by a processor of a computer, cause the computer to perform the method of any one of claims 1 to 7.
CN202210510406.9A 2022-05-11 2022-05-11 Resource scheduling method, device, electronic equipment and storage medium Pending CN115220908A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210510406.9A CN115220908A (en) 2022-05-11 2022-05-11 Resource scheduling method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210510406.9A CN115220908A (en) 2022-05-11 2022-05-11 Resource scheduling method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115220908A true CN115220908A (en) 2022-10-21

Family

ID=83607048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210510406.9A Pending CN115220908A (en) 2022-05-11 2022-05-11 Resource scheduling method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115220908A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116341886A (en) * 2023-05-31 2023-06-27 长鑫存储技术有限公司 Method, system, device, equipment and medium for processing photomask resources

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116341886A (en) * 2023-05-31 2023-06-27 长鑫存储技术有限公司 Method, system, device, equipment and medium for processing photomask resources
CN116341886B (en) * 2023-05-31 2023-11-14 长鑫存储技术有限公司 Method, system, device, equipment and medium for processing photomask resources

Similar Documents

Publication Publication Date Title
CN108537543B (en) Parallel processing method, device, equipment and storage medium for blockchain data
CN112015713B (en) Database task processing method and device, electronic equipment and readable medium
CN111768006B (en) Training method, device, equipment and storage medium for artificial intelligent model
US8082546B2 (en) Job scheduling to maximize use of reusable resources and minimize resource deallocation
CN111338785B (en) Resource scheduling method and device, electronic equipment and storage medium
US9542226B2 (en) Operating programs on a computer cluster
CN109471711B (en) Task processing method and device
CN114168302A (en) Task scheduling method, device, equipment and storage medium
CN105824705B (en) A kind of method for allocating tasks and electronic equipment
CN114546587A (en) Capacity expansion and reduction method of online image recognition service and related device
CN113157411A (en) Reliable configurable task system and device based on Celery
CN111625339A (en) Cluster resource scheduling method, device, medium and computing equipment
CN115220908A (en) Resource scheduling method, device, electronic equipment and storage medium
CN111831408A (en) Asynchronous task processing method and device, electronic equipment and medium
CN114237894A (en) Container scheduling method, device, equipment and readable storage medium
CN110928659B (en) Numerical value pool system remote multi-platform access method with self-adaptive function
CN115220907A (en) Resource scheduling method and device, electronic equipment and storage medium
CN113986097B (en) Task scheduling method and device and electronic equipment
US20090168092A1 (en) Job management and scheduling method for network system
CN116932168A (en) Heterogeneous core scheduling method and device, storage medium and electronic equipment
CN107632893B (en) Message queue processing method and device
CN114691376A (en) Thread execution method and device, electronic equipment and storage medium
CN117093335A (en) Task scheduling method and device for distributed storage system
CN113377515A (en) Task pre-scheduling method for Kubernetes resources
CN116980423B (en) Model scheduling method, device, computing system, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination