[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114327894A - Resource allocation method, device, electronic equipment and storage medium - Google Patents

Resource allocation method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114327894A
CN114327894A CN202111636358.XA CN202111636358A CN114327894A CN 114327894 A CN114327894 A CN 114327894A CN 202111636358 A CN202111636358 A CN 202111636358A CN 114327894 A CN114327894 A CN 114327894A
Authority
CN
China
Prior art keywords
task
resources
information
target
target task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111636358.XA
Other languages
Chinese (zh)
Inventor
解培
李银凤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Everbright Bank Co Ltd
Original Assignee
China Everbright Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Everbright Bank Co Ltd filed Critical China Everbright Bank Co Ltd
Priority to CN202111636358.XA priority Critical patent/CN114327894A/en
Publication of CN114327894A publication Critical patent/CN114327894A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a resource allocation method, a resource allocation device, electronic equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring a target task which is sent by a user and comprises task information; and if the target task is identified as an emergency starting task according to the emergency degree information in the task information, allocating resources for the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system so as to execute the target task. By the technical scheme, the requirement of immediate execution of the emergency task is met under the condition that resources in the cluster system can be fully and reasonably utilized.

Description

Resource allocation method, device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a resource allocation method, a resource allocation device, electronic equipment and a storage medium.
Background
At present, a cluster system for processing deep learning tasks generally processes tasks submitted by users according to a first-in first-out principle of a queue. The processing method cannot meet the diversified requirements of users, for example, important models acquired in real time cannot be trained immediately, and improvement is urgently needed.
Disclosure of Invention
The invention provides a resource allocation method, a resource allocation device, electronic equipment and a storage medium, which can meet the requirement of immediate execution of an emergency task under the condition of ensuring that resources in a cluster system can be fully and reasonably utilized.
In a first aspect, an embodiment of the present invention provides a resource allocation method, where the method includes:
acquiring a target task which is sent by a user and comprises task information;
and if the target task is identified as an emergency starting task according to the emergency degree information in the task information, allocating resources for the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system so as to execute the target task.
In a second aspect, an embodiment of the present invention further provides a resource allocation apparatus, where the apparatus includes:
the target task acquisition module is used for acquiring a target task which is sent by a user and comprises task information;
and the resource allocation module is used for allocating resources for the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system to execute the target task if the target task is identified as an emergency starting task according to the emergency degree information in the task information.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method of resource allocation as provided by any of the embodiments of the invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the resource allocation method provided in any embodiment of the present invention.
According to the technical scheme of the embodiment of the invention, the target task including the task information sent by the user is obtained; and if the target task is identified as the emergency starting task according to the emergency degree information in the task information, allocating resources for the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system so as to execute the target task. By the technical scheme, the requirement of immediate execution of the emergency task is met under the condition that resources in the cluster system can be fully and reasonably utilized.
Drawings
Fig. 1 is a flowchart of a resource allocation method according to an embodiment of the present invention;
fig. 2 is a flowchart of a resource allocation method according to a second embodiment of the present invention;
fig. 3 is a flowchart of a resource allocation method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a resource allocation apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a resource allocation method according to an embodiment of the present invention, where the method is applicable to resource allocation in a cluster system, and the method may be executed by a resource allocation apparatus, where the apparatus may be implemented by software and/or hardware, and may be integrated in an electronic device bearing a resource allocation function, such as a server device (a cluster scheduler).
As shown in fig. 1, the method may specifically include:
and S110, acquiring a target task which is sent by a user and comprises task information.
In this embodiment, the target task refers to a task that needs to be processed, and may include task information and the like. The task information may include, but is not limited to, a specified priority, a task type, a task mode, a result path, an iteration round, a demand resource, urgency information, and the like.
The designated priority is a priority preset by a user for processing the tasks, and is used for indicating the sequence of the tasks to be executed, and can be represented in a form of numbers, letters or data plus letters, for example, 0 represents the lowest priority, and 5 represents the highest priority; the task with the highest priority can not be interrupted by other tasks; tasks with priorities of 0-4 are tasks that can be cut out (interrupted), and the smaller the priority, the easier it is to cut out, i.e., interrupted, execution.
The so-called task type may include tensierflow, pytorch.
The task mode may be a training task or an inference task, etc.
The result path refers to a storage path of the task training result, for example, a storage path of the model parameters after each iteration.
The iteration round number refers to the total number of training iterations set by the user.
The required resource refers to a resource required for task execution, and may include, but is not limited to, specification of required computing resources (e.g., how many cores and CPUs, how much G memory, how many G disks, etc.), quantity, home item identifier such as item ID, type (e.g., virtual machine, container, or Hadoop computing task, etc.), other attributes (e.g., operating system of virtual machine, virtualization technology, required software version, etc.), initiator, and the like. For example, a demand resource may be a virtual machine or container of how many cores and G disk space are needed.
The emergency degree information comprises an emergency starting task and a normal starting task, wherein the emergency starting task indicates that the task needs to be started immediately after queue insertion, and if no resource exists, the task needs to preempt the resource to ensure that the task is started immediately; and normally starting the task means that the task is queued according to a normal priority strategy, and scheduling is performed according to the sequence when idle resources exist.
Specifically, a user submits a target task including task information to a cluster scheduler; correspondingly, the cluster scheduler analyzes the target task sent by the user to obtain task information of the target task; for example, a training task including task information sent by a user may be obtained.
And S120, if the target task is identified as an emergency starting task according to the emergency degree information in the task information, allocating resources for the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system so as to execute the target task.
Wherein, the current remaining resources in the cluster system are current idle resources. The current processing task can comprise a training task and an inference task, and for the training task, the state information can comprise at least one of current iteration round number, execution time of each round, remaining time of the current round and resource occupation condition; for the inference task, the state information may include resource occupation and remaining execution duration, etc.
Optionally, if the target task is identified as an emergency starting task according to the emergency degree information in the task information, comparing the required resources in the task information with the current residual resources in the cluster system, and if the current residual resources in the cluster system meet the required resources in the task information, directly allocating resources for the target task according to the current residual resources in the cluster system to execute the target task.
Further, if the current residual resources in the cluster system do not meet the required resources in the task information, resources are allocated to the target task according to the current residual resources in the cluster system and by combining the state information of the currently processed task in the cluster system, so as to execute the target task.
According to the technical scheme of the embodiment of the invention, the target task including the task information sent by the user is obtained; and if the target task is identified as the emergency starting task according to the emergency degree information in the task information, allocating resources for the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system so as to execute the target task. By the technical scheme, the requirement of immediate execution of the emergency task is met under the condition that resources in the cluster system can be fully and reasonably utilized.
On the basis of the above technical solution, after allocating resources to the target task, as an optional real-time manner in the embodiment of the present invention, a monitoring process may be further used to monitor the execution state of the target task in the process of executing the target task, so as to obtain the state information of the target task.
Specifically, in the process of executing the task, the execution state of the target task is monitored by using a monitoring process, for example, the current latest storage round number of the target task is analyzed from a result path in task information of the target task, the execution duration of each iteration round and the remaining duration of the current round are estimated by combining a timestamp, and the resource occupation condition of the target task is determined, so that the state information of the target task is obtained.
It can be understood that the execution state of the target task is monitored in real time through the monitoring process, the target task can be accurately tracked, and the state information of the target task can be known, so that a foundation is laid for flexibly allocating resources.
Example two
Fig. 2 is a flowchart of a resource allocation method provided in the second embodiment of the present invention, which is further optimized and provides an alternative implementation scheme based on the above embodiment.
As shown in fig. 2, the method may specifically include:
s210, acquiring a target task which is sent by a user and comprises task information.
And S220, if the target task is identified as the emergency starting task according to the emergency degree information in the task information, distributing resources for the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system so as to execute the target task.
In this embodiment, if the target task is identified as the emergency start task according to the urgency information in the task information, the resource may be allocated to the target task according to the required resource in the task information, the current remaining resource in the cluster system, and the state information of the task currently being processed in the cluster system, where if it is identified that the current remaining resource does not satisfy the required resource, the resource to be released is determined according to the current remaining resource and the required resource. The resource to be released refers to an additional resource still needed on the basis of the current remaining resource for executing the target task. For example, the difference between the current remaining resource and the required resource may be used as the resource to be released.
And after the resources needing to be released are determined, selecting the task capable of being suspended from the tasks in the current processing according to the resources needing to be released and the state information of the tasks in the current processing. The suspendable task refers to a task in which a task can be interrupted during execution. The current in-process task refers to a task in the process of being executed.
Optionally, the task that can be suspended is selected from the tasks in the current processing according to the resources that need to be released, the assigned priority of the task in the current processing, and the state information of the task in the current processing.
For example, a currently-processed task with a lower priority than that of the target task may be selected from the currently-processed tasks as a candidate suspension task according to the priority of the currently-processed task. And determining the task which can be suspended according to the state information of the candidate task to be suspended and the resource to be released.
According to the state information of the candidate suspension task and the resources to be released, the candidate suspension task close to the resources to be released is selected as the suspendable task according to the resource occupation condition in the state information of the candidate suspension task.
In another implementation manner, the suspendable task may be determined according to the state information of the candidate suspendable task and the resource to be released, and at least one candidate suspendable task meeting the resource to be released may be selected from the candidate suspendable tasks whose iteration rounds are less than the set round value according to the current iteration rounds in the state information of the candidate suspendable task, and is used as the suspendable task. Wherein, the setting wheel value is set by the person skilled in the art according to the actual situation.
In another implementation manner, the suspendable task is determined according to the state information of the candidate suspendable tasks and the resources to be released, and at least one candidate suspendable task meeting the requirements on the resources to be released is selected from the candidate suspendable tasks of which the execution time of each round is greater than the first set time as the suspendable task according to the execution time of each round in the state information of the candidate suspendable tasks. Wherein, the first set time length is set by the person skilled in the art according to the actual situation.
In another implementation, the suspendable task may be determined according to the state information of the candidate suspendable task and the resource to be released, and at least one candidate suspendable task satisfying the resource to be released may be selected from the candidate suspendable tasks whose current round remaining duration is less than the second set duration as the suspendable task according to the current round remaining duration in the state information of the candidate suspendable task. Wherein, the second set time length is set by the person skilled in the art according to the actual situation.
For example, the task to be suspended may be determined according to the state information of the candidate task to be suspended and the resource to be released, and the weight of each item may be set according to a combination of any two, any three, and any four items of the current iteration number, the execution time of each round, the remaining time of the current round, and the resource occupation condition in the state information of the candidate task to be suspended, and the task to be suspended may be selected from the candidate task to be suspended according to a result of the weighted sum.
For example, the score may be calculated by combining various factors according to formula (1), and the suspendable task may be selected according to the score, for example, sorting according to the score, and taking the highest or lowest score as the suspendable task.
score=w0*f(x)+w1*g(y)+w2*h(z)+w3*q(a) (1)
Wherein: score is the evaluation score, w0, w1, w2 and w3 are weighting coefficients; (x) is a priority rating function, x is a priority status vector; g (y) is a task state evaluation function, and y is a task state vector; h (z) is a queue state evaluation function, z is a queue state vector, q (a) is a cluster state evaluation function, and a is a cluster state vector.
After determining the task which can be suspended, stopping the execution operation of the task which can be suspended so as to release the resources occupied by the task which can be suspended; and allocating resources for the target task based on the released resources.
Specifically, the execution operation of the suspendable task is stopped, the resources occupied by the suspendable task are correspondingly released, and then the resources are allocated to the target task based on the released resources and the current remaining resources.
Optionally, before stopping the execution operation of the suspendable task, as an optional real-time manner of the embodiment of the present invention, the model parameters of the suspendable task may also be stored. Specifically, if the suspendable task is a deep learning model, the model parameters of the suspendable task are stored, so that the subsequent continuous execution of the suspendable task can be ensured to be carried out immediately under the condition of the suspension, and the time and resources are prevented from being wasted due to the restart of the execution.
According to the technical scheme of the embodiment of the invention, the target task including the task information sent by the user is obtained; and if the target task is identified as the emergency starting task according to the emergency degree information in the task information, allocating resources for the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system so as to execute the target task. By the technical scheme, the requirement of immediate execution of the emergency task is met under the condition that resources in the cluster system can be fully and reasonably utilized.
EXAMPLE III
Fig. 3 is a flowchart of a resource allocation method provided in a third embodiment of the present invention, which is further optimized and provides an alternative implementation scheme based on the above embodiment.
S310, acquiring a target task which is sent by a user and comprises task information.
S320, if the target task is identified to be a normal starting task according to the urgency information in the task information, adding the target task into the task queue, and distributing resources for the tasks to be processed in the task queue according to the strategy priority of the tasks to be processed in the task queue.
The strategy priority of the tasks to be processed in the task queue is determined according to at least one of the task waiting time, the demand resource and the number of the residual iteration rounds.
For example, the priority of the to-be-processed task in the task list may be determined according to the task waiting duration, for example, the to-be-processed tasks in the task queue are sorted according to the task waiting duration of the to-be-processed task, and the policy priority is determined according to the sorting result, for example, the longer the waiting duration, the higher the priority.
For example, the priority of the to-be-processed tasks in the task list may also be determined according to the demand resources, for example, the to-be-processed tasks in the task queue are sorted according to the size of the demand resources of the to-be-processed tasks, and the policy priority is determined according to the sorting result, for example, the smaller the demand resources, the higher the priority.
For example, the priority of the to-be-processed task in the task list may also be determined according to the number of remaining iteration rounds, for example, the to-be-processed tasks in the task queue are sorted according to the number of remaining iteration rounds of the to-be-processed task, and the policy priority is determined according to the sorting result, for example, the smaller the number of remaining iteration rounds, the higher the priority.
Illustratively, the policy priority of the to-be-processed task in the task queue can be determined according to the combination condition of any two items of the task waiting time, the demand resource and the residual iteration round number. For example, the policy priority of the to-be-processed task in the task queue is determined according to the task waiting time and the number of the remaining iteration rounds, specifically, the task waiting time and the number of the remaining iteration rounds are weighted and summed, and the policy priority of the to-be-processed task in the task queue is determined according to the weighted result.
For example, the policy priority of the to-be-processed task in the task queue may also be determined according to the task waiting duration, the required resource, and the number of remaining iteration rounds. Specifically, the task waiting duration, the required resources and the residual iteration rounds are weighted, and the weighted result is used for determining the policy priority of the tasks to be processed in the task queue.
For example, the score may also be calculated by combining multiple factors through the above formula (1), and the policy priority of the to-be-processed task in the task queue is determined according to the score, for example, the policy priority of the to-be-processed task with the highest or lowest score is configured as the highest priority, that is, the to-be-processed task is taken as the next starting task.
In this embodiment, if it is identified that the target task is a normal start task according to the urgency information in the task information, the target task is added to the task queue, and resources are allocated to the to-be-processed task in the task queue according to the policy priority of the to-be-processed task in the task queue.
S330, if the target task is identified as an emergency starting task according to the emergency degree information in the task information, resources are allocated to the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system so as to execute the target task.
According to the technical scheme of the embodiment of the invention, the target task comprising the task information sent by the user is obtained, and then if the target task is identified to be a normal starting task according to the urgency information in the task information, the target task is added into the task queue, and the resources are allocated to the tasks to be processed in the task queue according to the strategy priority of the tasks to be processed in the task queue; and if the target task is identified as the emergency starting task according to the emergency degree information in the task information, allocating resources for the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system so as to execute the target task. By the technical scheme, the requirement of immediate execution of the emergency task is met under the condition that resources in the cluster system can be fully and reasonably utilized, and reasonable distribution of the resources of the tasks to be processed under normal conditions is guaranteed.
Fig. 4 is a schematic structural diagram of a resource allocation apparatus according to a fourth embodiment of the present invention, where this embodiment is applicable to resource allocation in a cluster system, and the apparatus may be implemented in a software and/or hardware manner, and may be integrated in an electronic device bearing a resource allocation function, for example, a server device.
As shown in fig. 4, the apparatus may specifically include a target task obtaining module 410 and a resource allocating module 420, wherein,
a target task obtaining module 410, configured to obtain a target task that includes task information and is sent by a user;
and the resource allocation module 420 is configured to, if it is identified that the target task is the emergency start task according to the urgency information in the task information, allocate resources to the target task according to the required resources in the task information, the current remaining resources in the cluster system, and the state information of the task currently being processed in the cluster system, so as to execute the target task.
According to the technical scheme of the embodiment of the invention, the target task including the task information sent by the user is obtained; and if the target task is identified as the emergency starting task according to the emergency degree information in the task information, allocating resources for the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system so as to execute the target task. By the technical scheme, the requirement of immediate execution of the emergency task is met under the condition that resources in the cluster system can be fully and reasonably utilized.
Further, the resource allocation module 420 includes:
a resource to be released determining unit, configured to determine, if it is identified that the current remaining resource does not satisfy the demand resource, a resource to be released according to the current remaining resource and the demand resource;
the task selection unit capable of being suspended is used for selecting a task capable of being suspended from the tasks in the current processing according to the resources required to be released and the state information of the tasks in the current processing;
a resource releasing unit, configured to stop an execution operation of the suspendable task to release a resource occupied by the suspendable task;
and the resource allocation unit is used for allocating resources for the target task based on the released resources.
Further, the suspendable task selecting unit is specifically configured to:
and selecting the task which can be suspended from the tasks in the current processing according to the resources required to be released, the assigned priority of the task in the current processing and the state information of the task in the current processing.
Further, the current task includes a training task, and the state information includes at least one of a current iteration turn number, an execution time of each turn, a current turn remaining time, and a resource occupation situation.
Further, the resource allocation module 420 further includes:
and the parameter storage unit is used for storing the model parameters of the task which can be suspended.
Further, the resource allocation module 420 is further configured to:
and if the target task is identified to be a normal starting task according to the urgency information in the task information, adding the target task into the task queue, and allocating resources to the tasks to be processed in the task queue according to the strategy priority of the tasks to be processed in the task queue.
Further, the policy priority of the task to be processed in the task queue is determined according to at least one of the task waiting time, the demand resource and the number of the remaining iteration rounds.
Further, the apparatus further comprises:
and the state monitoring module is used for monitoring the execution state of the target task by adopting a monitoring process in the process of executing the target task so as to acquire the state information of the target task.
The resource allocation device can execute the resource allocation method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention, and fig. 5 shows a block diagram of an exemplary device suitable for implementing the embodiment of the present invention. The device shown in fig. 5 is only an example and should not bring any limitation to the function and the scope of use of the embodiments of the present invention.
As shown in FIG. 5, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory (cache 32). The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of embodiments described herein.
Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing, for example, implementing a resource allocation method provided by an embodiment of the present invention, by executing programs stored in the system memory 28.
EXAMPLE six
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program (or referred to as computer-executable instructions) is stored, where the computer program is used for executing, when executed by a processor, a resource allocation method provided in an embodiment of the present invention, where the method includes: acquiring a target task which is sent by a user and comprises task information; and if the target task is identified as the emergency starting task according to the emergency degree information in the task information, allocating resources for the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system so as to execute the target task.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the embodiments of the present invention have been described in more detail through the above embodiments, the embodiments of the present invention are not limited to the above embodiments, and many other equivalent embodiments may be included without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (11)

1. A method for resource allocation, comprising:
acquiring a target task which is sent by a user and comprises task information;
and if the target task is identified as an emergency starting task according to the emergency degree information in the task information, allocating resources for the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system so as to execute the target task.
2. The method of claim 1, wherein allocating resources for the target task according to the required resources in the task information, the current remaining resources in the cluster system, and the state information of the currently processing task in the cluster system comprises:
if the current residual resource is identified not to meet the demand resource, determining the resource to be released according to the current residual resource and the demand resource;
selecting a task which can be suspended from the tasks in the current processing according to the resources to be released and the state information of the tasks in the current processing;
stopping the execution operation of the suspendable task to release the resources occupied by the suspendable task;
and allocating resources for the target task based on the released resources.
3. The method of claim 2, wherein selecting an interruptible task from the currently processing tasks based on the resources to be freed and the state information of the currently processing tasks comprises:
and selecting the task which can be suspended from the tasks in the current processing according to the resources to be released, the assigned priority of the tasks in the current processing and the state information of the tasks in the current processing.
4. The method of any of claims 1-3, wherein the currently processing task comprises a training task, and wherein the state information comprises at least one of a current iteration turn number, an execution time per turn, a current turn remaining time, and a resource occupancy.
5. The method of claim 4, wherein prior to stopping the execution of the interruptible task, further comprising:
storing model parameters of the suspendable task.
6. The method of claim 1, wherein after obtaining the target task including task information sent by the user, the method further comprises:
and if the target task is identified to be a normal starting task according to the urgency information in the task information, adding the target task into a task queue, and allocating resources to the tasks to be processed in the task queue according to the strategy priority of the tasks to be processed in the task queue.
7. The method of claim 6, wherein the policy priority of the pending task in the task queue is determined according to at least one of task latency, required resources, and number of remaining iterations.
8. The method of claim 1, wherein after allocating resources for the target task, further comprising:
and in the process of executing the target task, monitoring the execution state of the target task by adopting a monitoring process so as to obtain the state information of the target task.
9. A resource allocation apparatus, comprising:
the target task acquisition module is used for acquiring a target task which is sent by a user and comprises task information;
and the resource allocation module is used for allocating resources for the target task according to the required resources in the task information, the current residual resources in the cluster system and the state information of the task in current processing in the cluster system to execute the target task if the target task is identified as an emergency starting task according to the emergency degree information in the task information.
10. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the resource allocation method of any one of claims 1-8.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the resource allocation method according to any one of claims 1 to 8.
CN202111636358.XA 2021-12-29 2021-12-29 Resource allocation method, device, electronic equipment and storage medium Pending CN114327894A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111636358.XA CN114327894A (en) 2021-12-29 2021-12-29 Resource allocation method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111636358.XA CN114327894A (en) 2021-12-29 2021-12-29 Resource allocation method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114327894A true CN114327894A (en) 2022-04-12

Family

ID=81017712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111636358.XA Pending CN114327894A (en) 2021-12-29 2021-12-29 Resource allocation method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114327894A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115509723A (en) * 2022-11-01 2022-12-23 中科雨辰科技有限公司 Data processing system for acquiring non-target task object
CN117057783A (en) * 2023-10-09 2023-11-14 巴斯夫一体化基地(广东)有限公司 Method and apparatus for determining maintenance routes within a plant
WO2024160136A1 (en) * 2023-01-30 2024-08-08 维沃移动通信有限公司 Task scheduling method and apparatus, and electronic device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170063718A1 (en) * 2015-09-02 2017-03-02 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Repurposing a target endpoint to execute a management task
CN111193802A (en) * 2019-12-31 2020-05-22 苏州浪潮智能科技有限公司 Dynamic resource allocation method, system, terminal and storage medium based on user group
CN112035267A (en) * 2020-11-03 2020-12-04 北京淇瑀信息科技有限公司 Cluster resource scheduling method, device and system based on MPI
CN112540841A (en) * 2020-12-28 2021-03-23 智慧神州(北京)科技有限公司 Task scheduling method and device, processor and electronic equipment
CN113687945A (en) * 2021-08-10 2021-11-23 深圳市长龙铁路电子工程有限公司 Management method, device, equipment and storage medium for locomotive data intelligent analysis algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170063718A1 (en) * 2015-09-02 2017-03-02 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Repurposing a target endpoint to execute a management task
CN111193802A (en) * 2019-12-31 2020-05-22 苏州浪潮智能科技有限公司 Dynamic resource allocation method, system, terminal and storage medium based on user group
CN112035267A (en) * 2020-11-03 2020-12-04 北京淇瑀信息科技有限公司 Cluster resource scheduling method, device and system based on MPI
CN112540841A (en) * 2020-12-28 2021-03-23 智慧神州(北京)科技有限公司 Task scheduling method and device, processor and electronic equipment
CN113687945A (en) * 2021-08-10 2021-11-23 深圳市长龙铁路电子工程有限公司 Management method, device, equipment and storage medium for locomotive data intelligent analysis algorithm

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115509723A (en) * 2022-11-01 2022-12-23 中科雨辰科技有限公司 Data processing system for acquiring non-target task object
CN115509723B (en) * 2022-11-01 2023-10-20 中科雨辰科技有限公司 Data processing system for acquiring non-target task object
WO2024160136A1 (en) * 2023-01-30 2024-08-08 维沃移动通信有限公司 Task scheduling method and apparatus, and electronic device
CN117057783A (en) * 2023-10-09 2023-11-14 巴斯夫一体化基地(广东)有限公司 Method and apparatus for determining maintenance routes within a plant

Similar Documents

Publication Publication Date Title
CN110727512B (en) Cluster resource scheduling method, device, equipment and storage medium
EP3008594B1 (en) Assigning and scheduling threads for multiple prioritized queues
CN107977268B (en) Task scheduling method and device for artificial intelligence heterogeneous hardware and readable medium
Huang et al. ShuffleDog: characterizing and adapting user-perceived latency of android apps
US10552213B2 (en) Thread pool and task queuing method and system
CN114327894A (en) Resource allocation method, device, electronic equipment and storage medium
US9262220B2 (en) Scheduling workloads and making provision decisions of computer resources in a computing environment
CN105378668B (en) The interruption of operating system management in multicomputer system guides
CN106557369A (en) A kind of management method and system of multithreading
CN111104210A (en) Task processing method and device and computer system
CN111459754B (en) Abnormal task processing method, device, medium and electronic equipment
CN111338791A (en) Method, device and equipment for scheduling cluster queue resources and storage medium
CN105022668B (en) Job scheduling method and system
US20240354157A1 (en) Machine-learning-based replenishment of interruptible workloads in cloud environment
CN114968567A (en) Method, apparatus and medium for allocating computing resources of a compute node
CN114371926A (en) Refined resource allocation method and device, electronic equipment and medium
CN111597044A (en) Task scheduling method and device, storage medium and electronic equipment
CN113626173B (en) Scheduling method, scheduling device and storage medium
CN114461365A (en) Process scheduling processing method, device, equipment and storage medium
CN114721818A (en) Kubernetes cluster-based GPU time-sharing method and system
CN117093335A (en) Task scheduling method and device for distributed storage system
CN114661475A (en) Distributed resource scheduling method and device for machine learning
CN112948094B (en) Method, device, equipment and storage medium for distributing process automation tasks
RU2818490C1 (en) Method and system for distributing system resources for processing user requests
CN115543554A (en) Method and device for scheduling calculation jobs and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination