[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN107045456A - A kind of resource allocation methods and explorer - Google Patents

A kind of resource allocation methods and explorer Download PDF

Info

Publication number
CN107045456A
CN107045456A CN201610080980.XA CN201610080980A CN107045456A CN 107045456 A CN107045456 A CN 107045456A CN 201610080980 A CN201610080980 A CN 201610080980A CN 107045456 A CN107045456 A CN 107045456A
Authority
CN
China
Prior art keywords
tasks
allocation
task
resource
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610080980.XA
Other languages
Chinese (zh)
Other versions
CN107045456B (en
Inventor
辛现银
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201610080980.XA priority Critical patent/CN107045456B/en
Priority to PCT/CN2016/112186 priority patent/WO2017133351A1/en
Publication of CN107045456A publication Critical patent/CN107045456A/en
Application granted granted Critical
Publication of CN107045456B publication Critical patent/CN107045456B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Multi Processors (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明实施例提供一种资源分配方法及资源管理器,用于提高资源利用率,和/或,用于提升用户作业的执行效率。该方法包括:接收客户端设备提交的作业,并将该作业分解为多个任务,其中,该多个任务中的每个任务均配置有相对应的资源需求量;估计每个任务的运行时间;根据每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形,该第一分配位形用于指示该多个任务在多个计算节点中的可运行计算节点上的分布情况,该调度策略包括资源利用率优先策略和效率优先策略中的至少一种;将该多个任务按照第一分配位形分配到该多个任务的可运行计算节点上。本发明适用于高性能集群领域。

Embodiments of the present invention provide a resource allocation method and a resource manager, which are used to improve resource utilization, and/or, to improve execution efficiency of user jobs. The method includes: receiving a job submitted by a client device, and decomposing the job into a plurality of tasks, wherein each task in the plurality of tasks is configured with a corresponding resource requirement; and estimating the running time of each task ; According to the resource demand and running time corresponding to each task, combined with the preset scheduling strategy, determine the first allocation configuration of the multiple tasks, and the first allocation configuration is used to indicate that the multiple tasks are in multiple computing The distribution of the operable computing nodes in the nodes, the scheduling strategy includes at least one of the resource utilization priority strategy and the efficiency priority strategy; the plurality of tasks can be assigned to the plurality of tasks according to the first allocation configuration run on a compute node. The invention is applicable to the field of high-performance clusters.

Description

一种资源分配方法及资源管理器Resource allocation method and resource manager

技术领域technical field

本发明涉及高性能集群领域,尤其涉及一种资源分配方法及资源管理器。The invention relates to the field of high-performance clusters, in particular to a resource allocation method and a resource manager.

背景技术Background technique

互联网的高速发展产生了大量的用户数据,分布式处理则是处理大规模数据集的标准手段。它的典型模式是将一个用户作业(英文:Job)分解为一系列可分布式运行的任务(英文:Task),并通过调度器(英文:Scheduler)将这些任务调度到合适的节点(英文:node)上进行运算。任务运行完成之后,将任务的运行结果做归集、整理,形成作业最终的结果输出。The rapid development of the Internet has produced a large amount of user data, and distributed processing is a standard method for processing large-scale data sets. Its typical mode is to decompose a user job (English: Job) into a series of tasks (English: Task) that can be distributed and run, and schedule these tasks to appropriate nodes (English: Scheduler) through the scheduler (English: Scheduler) node) to operate on. After the task operation is completed, the operation results of the task are collected and organized to form the final result output of the job.

调度器是集群资源与用户作业的耦合点。调度策略的好坏直接影响了整个集群的资源利用率和用户作业的执行效率。目前广泛应用的Hadoop系统的调度策略如图1所示。其中,Hadoop将有资源需求的Task按照一定的策略,如主资源公平(英文全称:dominantresource fairness,英文缩写:DRF)策略)排队,而各个节点通过心跳上报本节点上的资源量,并触发分配机制。若该节点上的资源量满足第一个Task的需求,调度器便将该Task安放在该节点上。然而,该调度策略仅考虑到了资源的公平性,比较单一,并不能根据不同场景需要灵活地选择资源利用率优先策略和效率优先策略来进行资源分配,从而无法使得集群资源的利用率较高,和/或,用户作业的执行效率较高。The scheduler is the coupling point between cluster resources and user jobs. The quality of the scheduling strategy directly affects the resource utilization of the entire cluster and the execution efficiency of user jobs. The scheduling strategy of the widely used Hadoop system is shown in Figure 1. Among them, Hadoop queues the tasks with resource requirements according to a certain strategy, such as the dominant resource fairness (English full name: dominantresource fairness, English abbreviation: DRF) strategy), and each node reports the amount of resources on the node through the heartbeat, and triggers the allocation mechanism. If the amount of resources on the node meets the requirements of the first Task, the scheduler places the Task on the node. However, this scheduling strategy only considers the fairness of resources, which is relatively simple, and cannot flexibly select the resource utilization priority strategy and efficiency priority strategy for resource allocation according to different scenarios, so that it cannot make the utilization of cluster resources higher. And/or, the execution efficiency of the user job is high.

发明内容Contents of the invention

本发明实施例提供一种资源分配方法及资源管理器,用于灵活选择资源利用率优先策略和效率优先策略来进行资源分配,从而提高资源利用率,和/或,提升用户作业的执行效率。Embodiments of the present invention provide a resource allocation method and a resource manager, which are used to flexibly select a resource utilization priority strategy and an efficiency priority strategy for resource allocation, thereby improving resource utilization, and/or improving user job execution efficiency.

为达到上述目的,本发明实施例提供如下技术方案:In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

第一方面,提供一种分布式计算系统中的资源分配方法,该分布式计算系统包括多个计算节点,该方法包括:接收客户端设备提交的作业,并将该作业分解为多个任务,其中,该多个任务中的每个任务均配置有相对应的资源需求量;估计每个任务的运行时间;根据每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形,该第一分配位形用于指示该多个任务在多个计算节点中的可运行计算节点上的分布情况,该调度策略包括资源利用率优先策略和效率优先策略中的至少一种;将该多个任务按照第一分配位形分配到该多个任务的可运行计算节点上。In a first aspect, a resource allocation method in a distributed computing system is provided, the distributed computing system includes multiple computing nodes, the method includes: receiving a job submitted by a client device, and decomposing the job into multiple tasks, Wherein, each task in the plurality of tasks is configured with a corresponding resource demand; the running time of each task is estimated; according to the resource demand and running time corresponding to each task, combined with a preset scheduling strategy, determine The first allocation configuration of the plurality of tasks, the first allocation configuration is used to indicate the distribution of the plurality of tasks on the runnable computing nodes among the plurality of computing nodes, the scheduling strategy includes resource utilization priority strategy and At least one of the efficiency priority strategies: assigning the plurality of tasks to computing nodes that can run the plurality of tasks according to the first allocation configuration.

基于本发明实施例提供的资源分配方法,该资源分配方法中,在接收客户端设备提交的作业,并将该作业分解为多个有相应的资源需求量配置的任务之后,还估计每个任务的运行时间,并根据每个任务的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形,然后将该多个任务按照该第一分配位形分配到该多个任务的可运行计算节点上。其中,该第一分配位形用于指示该多个任务在该多个任务的可运行计算节点上的分布情况,调度策略包括资源利用率优先策略和效率优先策略中的至少一种。也就是说,该方案考虑了每个任务的运行时间的因素,并且将空间需求(即任务的资源需求量)和时间需求(即任务的时间)固定的Task调度到对应的节点上时,能够根据相应的调度策略灵活选择资源利用率优先策略和效率优先策略来进行资源分配,使得最终采用资源利用率较高和/或效率较高的分配位形。一方面,由于可以采用资源利用率较高的分配位形,即可以通过调度策略使得那些节点资源利用率较高的Task组合被调度到节点上,因此该分配方案能有效减轻现有技术中资源碎片的问题,从而提升集群的资源利用率。另一方面,由于可以采用效率较高的分配位形,即可以通过调度策略使得那些作业执行时间最短的Task组合被调度到节点上,因此与现有技术相比,该分配方案能显著缩短作业执行时间,提升作业执行效率。综上,本发明实施例提供的资源分配方法能够根据相应的调度策略灵活选择资源利用率优先策略和效率优先策略来进行资源分配,从而可以提高资源利用率,和/或,可以提升用户作业的执行效率。Based on the resource allocation method provided by the embodiment of the present invention, in the resource allocation method, after receiving the job submitted by the client device and decomposing the job into a plurality of tasks with corresponding resource requirement configurations, each task is further estimated The running time of each task, and according to the resource requirements and running time of each task, combined with the preset scheduling strategy, determine the first allocation configuration of the multiple tasks, and then allocate the multiple tasks according to the first allocation configuration to the runnable compute nodes for the multiple tasks. Wherein, the first allocation configuration is used to indicate the distribution of the multiple tasks on the computing nodes that can run the multiple tasks, and the scheduling strategy includes at least one of resource utilization priority strategy and efficiency priority strategy. That is to say, the scheme takes into account the running time of each task, and when scheduling tasks with fixed space requirements (that is, the resource requirements of the task) and time requirements (that is, the time of the task) to the corresponding nodes, it can According to the corresponding scheduling policy, the resource utilization priority strategy and the efficiency priority strategy are flexibly selected for resource allocation, so that an allocation configuration with higher resource utilization rate and/or higher efficiency is finally adopted. On the one hand, since the allocation configuration with high resource utilization can be adopted, that is, the task combination with high resource utilization of those nodes can be scheduled to the nodes through the scheduling strategy, so this allocation scheme can effectively reduce the resource utilization in the prior art. The problem of fragmentation, thereby improving the resource utilization of the cluster. On the other hand, since a more efficient allocation configuration can be adopted, that is, the task combination with the shortest execution time of the job can be scheduled to the node through the scheduling strategy, so compared with the existing technology, this allocation scheme can significantly shorten the job Execution time, improve job execution efficiency. To sum up, the resource allocation method provided by the embodiment of the present invention can flexibly select the resource utilization rate priority policy and the efficiency priority policy for resource allocation according to the corresponding scheduling policy, thereby improving the resource utilization rate, and/or, improving the efficiency of user jobs. effectiveness.

结合第一方面,在第一方面第一种可能的实现方式中,若调度策略为资源利用率优先策略,则第一分配位形具体为使得该多个任务的可运行计算节点中的每个计算节点的单节点资源利用率最大的分配位形。With reference to the first aspect, in the first possible implementation manner of the first aspect, if the scheduling policy is a resource utilization priority policy, the first allocation configuration is specifically such that each of the multiple task-running computing nodes The allocation configuration that maximizes the single-node resource utilization of computing nodes.

结合第一方面,在第一方面第二种可能的实现方式中,若调度策略为效率优先策略,则第一分配位形具体为使得作业的整体执行速度最快的分配位形。With reference to the first aspect, in the second possible implementation manner of the first aspect, if the scheduling strategy is an efficiency-first strategy, the first allocation configuration is specifically the allocation configuration that enables the fastest overall job execution speed.

结合第一方面或第一方面第一种可能的实现方式或第一方面第二种可能的实现方式,在第一方面第三种可能的实现方式中,所述估计每个任务的运行时间,具体可以包括:针对每个任务,均按照下面针对第一任务的操作进行处理:将第一任务的硬信息与样本库中的历史任务的硬信息进行匹配;若匹配成功,根据与第一任务的硬信息匹配的历史任务的历史运行时间估计第一任务的运行时间。With reference to the first aspect or the first possible implementation of the first aspect or the second possible implementation of the first aspect, in the third possible implementation of the first aspect, the estimation of the running time of each task, Specifically, it may include: for each task, process according to the following operations for the first task: match the hard information of the first task with the hard information of the historical tasks in the sample library; The hard information matches the historical running times of historical tasks to estimate the running time of the first task.

具体的,本发明实施例中的硬信息具体可以包括作业类型、执行用户等信息。Specifically, the hard information in this embodiment of the present invention may specifically include job type, execution user and other information.

需要说明的是,本发明实施例仅是示例性的给出一种估计任务运行时间的具体实现,当然,还可以通过其它方式估计任务的运行时间,比如,通过预运行任务的方式。即,通过预先运行一小段作业实例获得准确的完整运行时间的估计。另外,同一作业的后续任务的运行时间参考已运行任务的运行时间也会获得更为准确的估计。本发明实施例对估计任务运行时间的具体实现方式不做限定。It should be noted that the embodiment of the present invention is only an example of a specific implementation of estimating the running time of a task. Of course, the running time of a task may also be estimated in other ways, for example, by pre-running the task. That is, by pre-running a small number of job instances to obtain an accurate estimate of the full runtime. Also, the run times of subsequent tasks of the same job will be more accurately estimated by reference to the run times of tasks that have already run. The embodiment of the present invention does not limit the specific implementation manner of estimating the running time of the task.

结合第一方面至第一方面第三种可能的实现方式中的任意一种可能的实现方式,在第一方面第四种可能的实现方式中,在根据每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形之前,还包括:将该多个任务按照资源的种类进行分类,获得至少一类任务;In combination with any possible implementation of the first aspect to the third possible implementation of the first aspect, in the fourth possible implementation of the first aspect, according to the resource demand and running Time, before determining the first allocation configuration of the plurality of tasks in combination with the preset scheduling strategy, further includes: classifying the plurality of tasks according to resource types to obtain at least one type of task;

所述根据所述每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定所述多个任务的第一分配位形,具体包括:针对该至少一类任务中的每类任务,均按照下面针对第一类任务的操作进行处理:根据第一类任务中每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定第一类任务的子分配位形,该子分配位形用于指示该第一类任务在多个计算节点中的可运行计算节点上的分布情况;将该至少一类任务中的每类任务的子分配位形的组合确定为该多个任务的第一分配位形。The determining the first allocation configuration of the plurality of tasks according to the resource demand and running time corresponding to each task in combination with a preset scheduling strategy specifically includes: for each type of the at least one type of task Tasks are processed according to the following operations for the first type of tasks: according to the resource demand and running time corresponding to each task in the first type of tasks, combined with the preset scheduling strategy, determine the sub-allocation configuration of the first type of tasks , the sub-allocation configuration is used to indicate the distribution of the first type of tasks on the runnable computing nodes among the plurality of computing nodes; the combination of sub-allocation configurations of each type of task in the at least one type of tasks is determined as A first allocation configuration of the plurality of tasks.

由于本发明实施例提供的资源分配方法可以首先将多个任务按照资源的种类进行分类,进而对于每一类任务分别进行资源分配,也就是说可以同时考虑对于异构集群和特殊资源需求作业的资源分配,因而具有更广泛的普适性和更好的综合表现。Since the resource allocation method provided by the embodiment of the present invention can first classify multiple tasks according to the types of resources, and then perform resource allocation for each type of task separately, that is to say, it can simultaneously consider the requirements for heterogeneous clusters and jobs with special resource requirements. Therefore, it has wider applicability and better comprehensive performance.

可选的,考虑到运行时间的估计会与实际情况通常会产生一些偏差。如果对这些偏差不做控制,则随着时间的延长,作业资源的预分配结果与理想结果可能相差愈来愈大。因此,本发明实施例提供的资源分配方法中,还可以引入变异机制(即重新分配)。即:Optionally, take into account that the running time estimates will usually have some deviation from the actual situation. If these deviations are not controlled, as time goes on, the pre-allocation results of job resources may be more and more different from the ideal results. Therefore, in the resource allocation method provided by the embodiment of the present invention, a mutation mechanism (that is, reallocation) may also be introduced. which is:

结合第一方面至第一方面第四种可能的实现方式中的任意一种可能的实现方式,在第一方面第五种可能的实现方式中,在所述将该多个任务按照第一分配位形分配到该多个任务的可运行计算节点上之后,还包括:根据第一分配位形,确定所有处于等待状态的任务运行在所分配的节点上时的第一整体分配目标函数值;根据所有处于等待状态的任务对应的资源需求量和运行时间,结合预设的调度策略,确定所有处于等待状态的任务的第二分配位形,该第二分配位形用于指示该所有处于等待状态的任务在该所有处于等待状态的任务的可运行计算节点上的分布情况;根据第二分配位形,确定该所有处于等待状态的任务运行在所分配的节点上时的第二整体分配目标函数值;若第二整体分配目标函数值大于第一整体分配目标函数值,将该所有处于等待状态的任务按照第二分配位形分配到所有处于等待状态的任务的可运行计算节点上。With reference to any one of the possible implementation manners of the first aspect to the fourth possible implementation manner of the first aspect, in the fifth possible implementation manner of the first aspect, the multiple tasks are allocated according to the first After the configuration is allocated to the operational computing nodes of the plurality of tasks, it also includes: according to the first allocation configuration, determining the first overall allocation objective function value when all tasks in the waiting state are running on the allocated nodes; According to the resource requirements and running time corresponding to all tasks in the waiting state, combined with the preset scheduling policy, determine the second allocation configuration of all tasks in the waiting state, and the second allocation configuration is used to indicate that all tasks in the waiting state The distribution of tasks in the waiting state on the runnable computing nodes of all the tasks in the waiting state; according to the second allocation configuration, determine the second overall allocation target when all the tasks in the waiting state run on the assigned nodes function value; if the second overall allocation objective function value is greater than the first overall allocation objective function value, all the tasks in the waiting state are allocated to the runnable computing nodes of all the tasks in the waiting state according to the second allocation configuration.

通过上述变异机制,可以使得作业资源的预分配结果向更好的方向进化。Through the above mutation mechanism, the pre-allocation results of job resources can be made to evolve in a better direction.

第二方面,提供一种资源管理器,该资源管理器包括:接收单元、分解单元、估计单元、确定单元和分配单元:接收单元,用于接收客户端设备提交的作业;分解单元,用于将该作业分解为多个任务,其中,该多个任务中的每个任务均配置有相对应的资源需求量;估计单元,用于估计每个任务的运行时间;确定单元,用于根据每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形,该第一分配位形用于指示该多个任务在多个计算节点中的可运行计算节点上的分布情况,该调度策略包括资源利用率优先策略和效率优先策略中的至少一种;分配单元,用于将该多个任务按照第一分配位形分配到该多个任务的可运行计算节点上。In a second aspect, a resource manager is provided, and the resource manager includes: a receiving unit, a decomposing unit, an estimating unit, a determining unit, and an allocating unit: the receiving unit is used to receive a job submitted by a client device; the decomposing unit is used to The job is decomposed into multiple tasks, wherein each task in the multiple tasks is configured with a corresponding resource requirement; an estimation unit is used to estimate the running time of each task; a determination unit is used to The resource demand and running time corresponding to each task, combined with the preset scheduling strategy, determine the first allocation configuration of the multiple tasks, and the first allocation configuration is used to indicate the allocation of the multiple tasks in the multiple computing nodes. The distribution situation on the operable computing node, the scheduling strategy includes at least one of the resource utilization priority strategy and the efficiency priority strategy; an allocation unit is used to allocate the multiple tasks to the multiple tasks according to the first allocation configuration on a runnable compute node.

基于本发明实施例提供的资源管理器,该资源管理器在接收客户端设备提交的作业,并将该作业分解为多个有相应的资源需求量配置的任务之后,还估计每个任务的运行时间,并根据每个任务的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形,然后将该多个任务按照该第一分配位形分配到该多个任务的可运行计算节点上。其中,该第一分配位形用于指示该多个任务在该多个任务的可运行计算节点上的分布情况,调度策略包括资源利用率优先策略和效率优先策略中的至少一种。也就是说,该资源管理器在进行资源分配时考虑了每个任务的运行时间的因素,并且将空间需求(即任务的资源需求量)和时间需求(即任务的时间)固定的Task调度到对应的节点上时,能够根据相应的调度策略灵活选择资源利用率优先策略和效率优先策略来进行资源分配,使得最终采用资源利用率较高和/或效率较高的分配位形。一方面,由于可以采用资源利用率较高的分配位形,即可以通过调度策略使得那些节点资源利用率较高的Task组合被调度到节点上,因此该资源管理器能有效减轻现有技术中资源碎片的问题,从而提升集群的资源利用率。另一方面,由于可以采用效率较高的分配位形,即可以通过调度策略使得那些作业执行时间最短的Task组合被调度到节点上,因此与现有技术相比,该资源管理器能显著缩短作业执行时间,提升作业执行效率。综上,本发明实施例提供的资源管理器能够根据相应的调度策略灵活选择资源利用率优先策略和效率优先策略来进行资源分配,从而可以提高资源利用率,和/或,可以提升用户作业的执行效率。Based on the resource manager provided by the embodiment of the present invention, after the resource manager receives the job submitted by the client device and decomposes the job into a plurality of tasks with corresponding resource requirement configurations, it also estimates the operation of each task. Time, and according to the resource demand and running time of each task, combined with the preset scheduling strategy, determine the first allocation configuration of the multiple tasks, and then allocate the multiple tasks to the Compute nodes that can run multiple tasks. Wherein, the first allocation configuration is used to indicate the distribution of the multiple tasks on the computing nodes that can run the multiple tasks, and the scheduling strategy includes at least one of resource utilization priority strategy and efficiency priority strategy. That is to say, the resource manager considers the running time factor of each task when performing resource allocation, and schedules Tasks with fixed space requirements (that is, the resource requirements of tasks) and time requirements (that is, the time of tasks) to When on the corresponding node, the resource utilization priority strategy and the efficiency priority strategy can be flexibly selected according to the corresponding scheduling strategy for resource allocation, so that the allocation configuration with higher resource utilization rate and/or higher efficiency is finally adopted. On the one hand, since the allocation configuration with high resource utilization can be adopted, that is, the task combination with high resource utilization of those nodes can be scheduled to the nodes through the scheduling strategy, so the resource manager can effectively alleviate the problems in the prior art. The problem of resource fragmentation, thereby improving the resource utilization of the cluster. On the other hand, since a more efficient allocation configuration can be adopted, that is, the task combination with the shortest execution time of the job can be scheduled to the node through the scheduling strategy, so compared with the existing technology, the resource manager can significantly shorten Job execution time, improving job execution efficiency. To sum up, the resource manager provided by the embodiment of the present invention can flexibly select the resource utilization rate priority policy and the efficiency priority policy to allocate resources according to the corresponding scheduling policy, so as to improve the resource utilization rate, and/or improve the efficiency of user jobs. effectiveness.

结合第二方面,在第二方面第一种可能的实现方式中,若调度策略为资源利用率优先策略,则第一分配位形具体为使得该多个任务的可运行计算节点中的每个计算节点的单节点资源利用率最大的分配位形。With reference to the second aspect, in the first possible implementation of the second aspect, if the scheduling strategy is a resource utilization priority strategy, then the first allocation configuration is specifically such that each of the multiple task-running computing nodes The allocation configuration that maximizes the single-node resource utilization of computing nodes.

结合第二方面,在第二方面第二种可能的实现方式中,若调度策略为效率优先策略,则第一分配位形具体为使得作业的整体执行速度最快的分配位形。With reference to the second aspect, in a second possible implementation manner of the second aspect, if the scheduling strategy is an efficiency-first strategy, the first allocation configuration is specifically the allocation configuration that enables the fastest overall job execution speed.

结合第二方面或第二方面第一种可能的实现方式或第二方面第二种可能的实现方式,在第二方面第三种可能的实现方式中,估计单元具体用于:针对每个任务,均按照下面针对第一任务的操作进行处理:将第一任务的硬信息与样本库中的历史任务的硬信息进行匹配;若匹配成功,根据与第一任务的硬信息匹配的历史任务的历史运行时间估计第一任务的运行时间。In combination with the second aspect or the first possible implementation of the second aspect or the second possible implementation of the second aspect, in the third possible implementation of the second aspect, the estimation unit is specifically configured to: for each task , are processed according to the following operations for the first task: match the hard information of the first task with the hard information of the historical tasks in the sample library; if the matching is successful, according to the The historical running time estimates the running time of the first task.

具体的,本发明实施例中的硬信息具体可以包括作业类型、执行用户等信息。Specifically, the hard information in this embodiment of the present invention may specifically include job type, execution user and other information.

需要说明的是,本发明实施例仅是示例性的给出一种估计单元估计任务运行时间的具体实现,当然,估计单元还可以通过其它方式估计任务的运行时间,比如,通过预运行任务的方式。即,通过预先运行一小段作业实例获得准确的完整运行时间的估计。另外,同一作业的后续任务的运行时间参考已运行任务的运行时间也会获得更为准确的估计。本发明实施例对估计单元估计任务运行时间的具体实现方式不做限定。It should be noted that this embodiment of the present invention is only an example of a specific implementation of estimating the running time of a task by the estimating unit. Of course, the estimating unit can also estimate the running time of the task in other ways, for example, by pre-running the task Way. That is, by pre-running a small number of job instances to obtain an accurate estimate of the full runtime. Also, the run times of subsequent tasks of the same job will be more accurately estimated by reference to the run times of tasks that have already run. The embodiment of the present invention does not limit the specific implementation manner of estimating the running time of the task by the estimating unit.

结合第二方面至第二方面第三种可能的实现方式中的任意一种可能的实现方式,在第二方面第四种可能的实现方式中,资源管理器还包括分类单元;在确定单元根据每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形之前,分类单元,用于将该多个任务按照资源的种类进行分类,获得至少一类任务;With reference to any one of the possible implementation manners from the second aspect to the third possible implementation manner of the second aspect, in the fourth possible implementation manner of the second aspect, the resource manager further includes a classification unit; The resource demand and running time corresponding to each task, combined with the preset scheduling strategy, before determining the first allocation configuration of the multiple tasks, the classification unit is used to classify the multiple tasks according to the type of resources to obtain at least one type of task;

确定单元具体用于:针对该至少一类任务中的每类任务,均按照下面针对第一类任务的操作进行处理:根据第一类任务中每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定第一类任务的子分配位形,该子分配位形用于指示该第一类任务在多个计算节点中的可运行计算节点上的分布情况;将该至少一类任务中的每类任务的子分配位形的组合确定为该多个任务的第一分配位形。The determination unit is specifically configured to: for each type of task in the at least one type of task, perform processing according to the following operation for the first type of task: according to the resource demand and running time corresponding to each task in the first type of task, combined with The preset scheduling policy determines the sub-allocation configuration of the first type of task, and the sub-allocation configuration is used to indicate the distribution of the first type of task on the runnable computing nodes among the plurality of computing nodes; the at least one A combination of sub-allocation configurations of each type of task in the type of tasks is determined as a first allocation configuration of the plurality of tasks.

由于本发明实施例提供的资源管理器可以首先将多个任务按照资源的种类进行分类,进而对于每一类任务分别进行资源分配,也就是说可以同时考虑对于异构集群和特殊资源需求作业的资源分配,因而具有更广泛的普适性和更好的综合表现。Since the resource manager provided by the embodiment of the present invention can first classify multiple tasks according to the types of resources, and then perform resource allocation for each type of task separately, that is to say, it can simultaneously consider the requirements for heterogeneous clusters and jobs with special resource requirements. Therefore, it has wider applicability and better comprehensive performance.

可选的,考虑到运行时间的估计会与实际情况通常会产生一些偏差。如果对这些偏差不做控制,则随着时间的延长,作业资源的预分配结果与理想结果可能相差愈来愈大。因此,本发明实施例提供的资源管理器在进行资源分配时,还可以引入变异机制(即重新分配)。即:Optionally, take into account that the running time estimates will usually have some deviation from the actual situation. If these deviations are not controlled, as time goes on, the pre-allocation results of job resources may be more and more different from the ideal results. Therefore, when the resource manager provided by the embodiment of the present invention allocates resources, it can also introduce a mutation mechanism (that is, reallocate). which is:

结合第二方面至第二方面第四种可能的实现方式中的任意一种可能的实现方式,在第二方面第五种可能的实现方式中,在分配单元将该多个任务按照第一分配位形分配到该多个任务的可运行计算节点上之后,确定单元,还用于:根据第一分配位形,确定所有处于等待状态的任务运行在所分配的节点上时的第一整体分配目标函数值;根据所有处于等待状态的任务对应的资源需求量和运行时间,结合预设的调度策略,确定所有处于等待状态的任务的第二分配位形,该第二分配位形用于指示该所有处于等待状态的任务在该所有处于等待状态的任务的可运行计算节点上的分布情况;根据第二分配位形,确定该所有处于等待状态的任务运行在所分配的节点上时的第二整体分配目标函数值;分配单元,还用于若第二整体分配目标函数值大于第一整体分配目标函数值,将该所有处于等待状态的任务按照第二分配位形分配到所有处于等待状态的任务的可运行计算节点上。With reference to any one of the possible implementation manners of the second aspect to the fourth possible implementation manner of the second aspect, in the fifth possible implementation manner of the second aspect, the allocation unit assigns the plurality of tasks according to the first allocation After the configuration is allocated to the operational computing nodes of the plurality of tasks, the determining unit is also used to: determine the first overall allocation when all tasks in the waiting state run on the allocated nodes according to the first allocation configuration Objective function value; according to the resource requirements and running time corresponding to all tasks in the waiting state, combined with the preset scheduling strategy, determine the second allocation configuration of all tasks in the waiting state, and the second allocation configuration is used to indicate The distribution of all the tasks in the waiting state on the runnable computing nodes of the tasks in the waiting state; according to the second allocation configuration, determine the first time when all the tasks in the waiting state are running on the allocated nodes Two overall allocation objective function values; the allocation unit is also used to assign all tasks in the waiting state to all waiting states according to the second allocation configuration if the second overall allocation objective function value is greater than the first overall allocation objective function value On the runnable compute node of the task.

通过上述变异机制,可以使得作业资源的预分配结果向更好的方向进化。Through the above mutation mechanism, the pre-allocation results of job resources can be made to evolve in a better direction.

结合第一方面第五种可能的实现方式,在第一方面第六种可能的实现方式中;或者,结合第二方面第五种可能的实现方式,在第二方面第六种可能的实现方式中,所述第一整体分配目标函数值等于所述第一分配位形时,所有处于等待状态的任务运行在所分配的节点上时各个节点的单节点分配目标函数值的和;In combination with the fifth possible implementation of the first aspect, in the sixth possible implementation of the first aspect; or, in combination with the fifth possible implementation of the second aspect, in the sixth possible implementation of the second aspect , when the first overall allocation objective function value is equal to the first allocation configuration, the sum of the single-node allocation objective function values of each node when all tasks in the waiting state run on the allocated nodes;

所述第二整体分配目标函数值等于所述第二分配位形时,所有处于等待状态的任务运行在所分配的节点上时各个节点的单节点分配目标函数值的和。The second overall allocation objective function value is equal to the sum of the single-node allocation objective function values of each node when all tasks in the waiting state run on the allocated nodes when the second allocation configuration is configured.

可选的,一种可能的实现方式中,上述的单节点分配目标函数具体包括:其中,Sn表示单节点n的分配目标函数值;p表示作业时间优先级因子,p>0;f表示作业公平性因子,f>0;p+f≤1;m表示作业的数量;Se,n表示节点n上的资源利用率得分,rn表示节点n的资源,rt表示任务t的资源需求量;Sp,j表示作业j的执行进度,Tj表示作业j还需要多少时间执行完毕,该值可由历史统计数据得出,T0表示作业j的总体运行时间;Sf,j表示作业j的公平性得分,rj表示作业j的资源需求,rf表示作业i在完全公平情况下的应得资源。Optionally, in a possible implementation manner, the above-mentioned single-node allocation objective function specifically includes: Among them, S n represents the distribution objective function value of single node n; p represents the job time priority factor, p>0; f represents the job fairness factor, f>0;p+f≤1; m represents the number of jobs; S e, n represents the resource utilization score on node n, r n represents the resources of node n, r t represents the resource demand of task t; S p,j represents the execution progress of job j, T j indicates how much time is required for job j to be executed, and this value can be obtained from historical statistical data. T 0 indicates the overall running time of job j; S f, j indicates the fairness score of job j, r j represents the resource requirement of job j, and r f represents the due resource of job i under the condition of complete fairness.

其中,在上述单节点分配目标函数中,考虑了节点的资源利用率、作业的公平性以及作业的执行进度。其中,当f=0,p=0时,完全考虑资源利用率,即资源管理器110按照整体资源利用率最高的原则分配资源;当f=1,p=0时,完全考虑公平,即资源管理器110会在不同的作业间公平地分配资源;当f=1,p=1时,则完全考虑时间优先级,即资源管理器110优先分配资源给那些更快完成的作业。当然,f和p还可以为其它数值,用户可以根据作业的运行需求进行设置,使得分配在最优资源利用率和最优作业执行时间取得平衡,本发明实施例对此不作具体限定。Wherein, in the above-mentioned single-node allocation objective function, the resource utilization rate of the node, the fairness of the job, and the execution progress of the job are considered. Wherein, when f=0, p=0, resource utilization is fully considered, that is, the resource manager 110 allocates resources according to the principle of the highest overall resource utilization; when f=1, p=0, fairness is fully considered, that is, resource The manager 110 will allocate resources among different jobs fairly; when f=1 and p=1, time priority is fully considered, that is, the resource manager 110 gives priority to assigning resources to jobs that finish faster. Of course, f and p can also be other values, and the user can set them according to the operation requirements of the job, so that the allocation can be balanced between the optimal resource utilization rate and the optimal job execution time, which is not specifically limited in the embodiment of the present invention.

第三方面,提供一种资源管理器,该资源管理器包括:处理器、存储器、总线和通信接口;存储器用于存储计算机执行指令,处理器与存储器通过总线连接,当资源管理器运行时,处理器执行存储器存储的计算机执行指令,以使资源管理器执行上述如第一方面或第一方面任意一种可能的实现方式中所示的资源分配方法。In a third aspect, a resource manager is provided, and the resource manager includes: a processor, a memory, a bus, and a communication interface; the memory is used to store computer-executed instructions, the processor and the memory are connected through the bus, and when the resource manager is running, The processor executes the computer-executable instructions stored in the memory, so that the resource manager executes the above resource allocation method as shown in the first aspect or any possible implementation manner of the first aspect.

由于本发明实施例提供的资源管理器可以用于执行上述如第一方面或第一方面任意一种可能的实现方式中所示的资源分配方法,因此,其所能获得的技术效果可以参考上述如第一方面或第一方面任意一种可能的实现方式中所示的资源分配方法的技术效果,此处不再赘述。Since the resource manager provided by the embodiment of the present invention can be used to execute the above-mentioned resource allocation method as shown in the first aspect or any possible implementation of the first aspect, the technical effect it can obtain can refer to the above-mentioned The technical effect of the resource allocation method shown in the first aspect or any possible implementation manner of the first aspect will not be repeated here.

第四方面,提供一种分布式计算机系统,该分布式计算机系统包括多个计算节点和第一方面或第一方面任意一种可能的实现方式中所述的资源管理器;或者,该分布式计算机系统包括多个计算节点和第三方面所述的资源管理器。In a fourth aspect, a distributed computer system is provided, and the distributed computer system includes a plurality of computing nodes and the first aspect or the resource manager described in any possible implementation of the first aspect; or, the distributed The computer system includes multiple computing nodes and the resource manager described in the third aspect.

由于本发明实施例提供的分布式计算机系统包括第一方面或第一方面任意一种可能的实现方式中所述的资源管理器;或者,包括第三方面所述的资源管理器,因此,其所能获得的技术效果可参考上述资源管理器的技术效果,本发明实施例在此不再赘述。Since the distributed computer system provided by the embodiment of the present invention includes the resource manager described in the first aspect or any possible implementation of the first aspect; or includes the resource manager described in the third aspect, therefore, it For the technical effects that can be obtained, reference can be made to the technical effects of the above-mentioned resource manager, and the embodiments of the present invention will not be repeated here.

第五方面,提供一种可读介质,包括计算机执行指令,当资源管理器的处理器执行该计算机执行指令时,该资源管理器执行如上述第一方面或者第一方面的任意一种可选方式中所述的资源分配方法。In a fifth aspect, there is provided a readable medium, including computer-executable instructions. When a processor of the resource manager executes the computer-executable instructions, the resource manager executes any optional The resource allocation method described in Methods.

其中,本发明的这些方面或其他方面在以下实施例的描述中会更加简明易懂。Wherein, these or other aspects of the present invention will be more concise and understandable in the description of the following embodiments.

附图说明Description of drawings

图1为现有的Hadoop系统的调度策略示意图;Fig. 1 is the scheduling strategy schematic diagram of existing Hadoop system;

图2为本发明实施例提供的一种分布式计算系统的逻辑架构图;FIG. 2 is a logical architecture diagram of a distributed computing system provided by an embodiment of the present invention;

图3为本发明实施例提供的一种分布式计算系统的物理架构示意图;FIG. 3 is a schematic diagram of a physical architecture of a distributed computing system provided by an embodiment of the present invention;

图4为本发明实施例提供的资源分配方法的原理示意图;FIG. 4 is a schematic diagram of the principle of a resource allocation method provided by an embodiment of the present invention;

图5为本发明实施例提供的资源分配方法流程示意图一;FIG. 5 is a first schematic flow diagram of a resource allocation method provided by an embodiment of the present invention;

图6为本发明实施例提供的资源分配方法流程示意图二;FIG. 6 is a second schematic flow diagram of a resource allocation method provided by an embodiment of the present invention;

图7为本发明实施例提供的资源分配方法流程示意图三;FIG. 7 is a third schematic flow diagram of a resource allocation method provided by an embodiment of the present invention;

图8为本发明实施例提供的资源分配结果的变异机制示意图;FIG. 8 is a schematic diagram of a variation mechanism of a resource allocation result provided by an embodiment of the present invention;

图9为本发明实施例提供的采用资源利用率优先的原则进行资源分配的结果示意图;FIG. 9 is a schematic diagram of resource allocation results provided by an embodiment of the present invention using the principle of resource utilization priority;

图10为本发明实施例提供的采用公平优先的原则进行资源分配的结果示意图;FIG. 10 is a schematic diagram of the result of resource allocation using the principle of fairness and priority provided by the embodiment of the present invention;

图11为本发明实施例提供的资源管理器的结构示意图一;FIG. 11 is a first structural diagram of a resource manager provided by an embodiment of the present invention;

图12为本发明实施例提供的资源管理器的结构示意图二;FIG. 12 is a second structural diagram of a resource manager provided by an embodiment of the present invention;

图13为本发明实施例提供的资源管理器的结构示意图三。FIG. 13 is a third structural schematic diagram of a resource manager provided by an embodiment of the present invention.

具体实施方式detailed description

需要说明的是,为了便于清楚描述本发明实施例的技术方案,在本发明的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分,本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定。It should be noted that, in order to clearly describe the technical solutions of the embodiments of the present invention, in the embodiments of the present invention, words such as "first" and "second" are used for the same or similar items with basically the same function and effect To make a distinction, those skilled in the art can understand that words such as "first" and "second" do not limit the quantity and execution order.

需要说明的是,本文中的“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。“多个”是指两个或多于两个。It should be noted that "/" in this article means or, for example, A/B can mean A or B; "and/or" in this article is just an association relationship describing associated objects, indicating that there can be three A relationship, for example, A and/or B, can mean: A exists alone, A and B exist simultaneously, and B exists alone. "A plurality" means two or more than two.

如本申请所使用的,术语“组件”、“模块”、“系统”等等旨在指代计算机相关实体,该计算机相关实体可以是硬件、固件、硬件和软件的结合、软件或者运行中的软件。例如,组件可以是,但不限于是:在处理器上运行的处理、处理器、对象、可执行文件、执行中的线程、程序和/或计算机。作为示例,在计算设备上运行的应用和该计算设备都可以是组件。一个或多个组件可以存在于执行中的过程和/或线程中,并且组件可以位于一个计算机中以及/或者分布在两个或更多个计算机之间。此外,这些组件能够从在其上具有各种数据结构的各种计算机可读介质中执行。这些组件可以通过诸如根据具有一个或多个数据分组(例如,来自一个组件的数据,该组件与本地系统、分布式系统中的另一个组件进行交互和/或以信号的方式通过诸如互联网之类的网络与其它系统进行交互)的信号,以本地和/或远程过程的方式进行通信。As used in this application, the terms "component," "module," "system" and the like are intended to refer to a computer-related entity, which may be hardware, firmware, a combination of hardware and software, software, or an operating system. software. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. As an example, both an application running on a computing device and the computing device can be components. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. These components can be communicated through, for example, according to having one or more packets of data (e.g., data from a component that interacts with another component in a local system, a distributed system, and/or in the form of network to interact with other systems) to communicate with local and/or remote processes.

本申请将围绕可包括多个设备、组件、模块等的系统来呈现各个方面、实施例或特征。应当理解和明白的是,各个系统可以包括另外的设备、组件、模块等,并且/或者可以并不包括结合附图讨论的所有设备、组件、模块等。此外,还可以使用这些方案的组合。The present application presents various aspects, embodiments or features in terms of a system that can include a number of devices, components, modules and the like. It is to be understood and appreciated that the various systems may include additional devices, components, modules, etc. and/or may not include all of the devices, components, modules etc. discussed in connection with the figures. In addition, combinations of these schemes can also be used.

另外,在本发明实施例中,“示例的”一词用于表示作例子、例证或说明。本申请中被描述为“示例”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用示例的一词旨在以具体方式呈现概念。In addition, in the embodiments of the present invention, the word "exemplary" is used as an example, illustration or illustration. Any embodiment or design described herein as "example" is not to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of the word example is intended to present concepts in a concrete manner.

本发明实施例描述的场景是为了更加清楚的说明本发明实施例的技术方案,并不构成对于本发明实施例提供的技术方案的限定,本领域普通技术人员可知,随着新场景的出现,本发明实施例提供的技术方案对于类似的技术问题,同样适用。The scenarios described in the embodiments of the present invention are to illustrate the technical solutions of the embodiments of the present invention more clearly, and do not constitute limitations on the technical solutions provided by the embodiments of the present invention. Those of ordinary skill in the art know that with the emergence of new scenarios, The technical solutions provided by the embodiments of the present invention are also applicable to similar technical problems.

为了下述各实施例的描述清楚简洁,首先给出相关概念的简要介绍:In order to make the description of the following embodiments clear and concise, a brief introduction of related concepts is given first:

Cluster:集群,指多台同构的或者异构的计算机节点通过网络组合起来,配合一定的集群管理系统,形成的能对外提供统一计算或存储服务的设施。Cluster: Cluster refers to the combination of multiple homogeneous or heterogeneous computer nodes through the network, and cooperates with a certain cluster management system to form a facility that can provide unified computing or storage services to the outside world.

Resource:资源,即指分布式集群上可供利用的内存、中央处理器(英文全称:central processing unit,英文缩写:CPU)、网络、磁盘等用于运行作业所必须的硬件。Resource: resources refer to the available memory, central processing unit (full English name: central processing unit, English abbreviation: CPU), network, disk and other hardware necessary for running jobs on the distributed cluster.

Job:作业,指用户通过客户端设备向集群提交的可被运行的一个完整的任务。Job: A job refers to a complete task that can be run submitted by the user to the cluster through the client device.

Task:任务,一个作业被提交到集群上执行时,通常分解为很多任务,每个任务运行在一个特定的集群节点上,并占用一定量的资源。Task: Task. When a job is submitted to the cluster for execution, it is usually decomposed into many tasks. Each task runs on a specific cluster node and occupies a certain amount of resources.

Scheduler:调度器,是用来向作业分配可供任务运行的资源的引擎模块,也是集群管理系统最重要的组成部分。Scheduler: The scheduler is an engine module used to allocate resources available to tasks to run, and is also the most important part of the cluster management system.

本发明实施例的方案可典型地应用于分布式计算系统中,用于实现任务调度以及资源的高效分配。图2示出了一种分布式计算系统的逻辑架构图,根据图2,该分布式计算系统包括由集群资源构成的资源池,资源管理器以及计算框架,集群资源即集群中各个计算节点的运算、存储等硬件资源,资源管理器部署在集群中的一个或多个计算节点上,或者也可以作为一个独立的物理设备,用于统一管理集群资源,并为上层的计算框架提供资源调度能力。一个分布式计算系统可以同时支持多种不同的计算框架,如图2所示的系统,该系统可支持MR(英文全称:map reduce)、Storm、S4(英文全称:simple scalable streamingsystem)以及MPI(英文全称:message passing interface)等计算框架中的一种或多种。资源管理器通过对客户端设备发送的不同计算框架类型的应用程序进行统一的调度,以便提高资源利用率。图3进一步示出了分布式计算系统的物理架构示意图,包括集群、资源管理器和客户端设备,其中,集群中包括多个节点(图3中仅仅示出了三个节点),资源管理器部署在集群中的某个节点上,每个节点均可以与资源管理器通信,客户端设备向资源管理器提交应用程序的资源请求,资源管理器根据特定的资源调度策略将节点的资源分配给应用程序,以使得应用程序根据分配的节点资源在该节点上运行。The solution of the embodiment of the present invention can be typically applied to a distributed computing system to realize task scheduling and efficient allocation of resources. Fig. 2 shows a logical architecture diagram of a distributed computing system. According to Fig. 2, the distributed computing system includes a resource pool composed of cluster resources, a resource manager and a computing framework. Computing, storage and other hardware resources, the resource manager is deployed on one or more computing nodes in the cluster, or it can also be used as an independent physical device to manage cluster resources in a unified manner and provide resource scheduling capabilities for the upper computing framework . A distributed computing system can support a variety of different computing frameworks at the same time. The system shown in Figure 2 can support MR (full name in English: map reduce), Storm, S4 (full name in English: simple scalable streaming system) and MPI ( English full name: one or more of computing frameworks such as message passing interface). The resource manager performs unified scheduling on the application programs of different computing framework types sent by the client device, so as to improve resource utilization. Fig. 3 further shows a schematic diagram of the physical architecture of a distributed computing system, including a cluster, a resource manager and a client device, wherein the cluster includes multiple nodes (only three nodes are shown in Fig. 3), and the resource manager Deployed on a node in the cluster, each node can communicate with the resource manager, the client device submits the resource request of the application to the resource manager, and the resource manager allocates the resources of the node to the resource manager according to a specific resource scheduling strategy application, so that the application runs on the node according to the allocated node resources.

本发明实施例主要对分布式计算系统中的资源管理器进行了优化,使其更合理地为任务分配资源,以提高资源利用率。其中,图4为本发明实施例提供的资源的分配方法的原理示意图。The embodiment of the present invention mainly optimizes the resource manager in the distributed computing system, so that it can allocate resources to tasks more reasonably, so as to improve resource utilization. Wherein, FIG. 4 is a schematic diagram of a principle of a method for allocating resources provided by an embodiment of the present invention.

如图4所示,本发明实施例中,在客户端设备提交作业后,作业被分解为一系列可分布式运行的任务(Task),每个Task都配置有相对应的资源需求量(图2中用横向的宽度表征资源需求量)。当Task经过上述资源管理器中的经验模块(英文全称:ExperiencedExpert,英文缩写:E-Expert)时,E-Expert通过资源管理器中的样本库中的作业历史执行情况估计每个Task的运行时间,进而可以得到空间需求(即Task的资源需求量)和时间需求(即Task的运行时间,图2中用纵向的长度表征Task运行时间)固定的Task。打包模块(英文:Packer)考虑一定的调度策略,将空间需求(即Task的资源需求量)和时间需求(即Task的运行时间)固定的Task调度到对应的节点上。其中,该调度策略包括资源利用率优先策略、或者效率优先策略。也就是说,打包方法能够根据相应的调度策略灵活选择资源利用率优先策略和效率优先策略来进行资源分配,使得最终采用资源利用率较高和/或效率较高的分配位形。As shown in FIG. 4, in the embodiment of the present invention, after the client device submits the job, the job is decomposed into a series of tasks (Tasks) that can be run in a distributed manner, and each Task is configured with a corresponding resource requirement (Fig. In 2, the horizontal width is used to represent the resource demand). When the Task passes through the experience module (English full name: ExperiencedExpert, English abbreviation: E-Expert) in the resource manager, E-Expert estimates the running time of each Task through the job history execution status in the sample library in the resource manager , and then you can get a Task with fixed space requirements (ie, the resource requirements of the Task) and time requirements (ie, the running time of the Task, which is represented by the vertical length in Figure 2). The packing module (English: Packer) considers a certain scheduling strategy, and schedules tasks with fixed space requirements (that is, the resource requirements of the Task) and time requirements (that is, the running time of the Task) to corresponding nodes. Wherein, the scheduling policy includes a resource utilization priority policy, or an efficiency priority policy. That is to say, the packaging method can flexibly select a resource utilization priority strategy and an efficiency priority strategy according to a corresponding scheduling strategy for resource allocation, so that an allocation configuration with higher resource utilization and/or higher efficiency is finally adopted.

其中,Task在每个节点形成一个等待队列,每个等待队列的长度都是大致相等的。在Task排队等待过程中,有可能发生变异导致新一轮的作业资源的重新分配。如果新分配位形的整体资源利用率更高或者效率更高,则更新到新的分配位形。Among them, Task forms a waiting queue at each node, and the length of each waiting queue is approximately equal. During the queuing process of Tasks, there may be mutations that lead to a new round of reallocation of job resources. Update to a new allocation configuration if the overall resource utilization or efficiency of the new allocation configuration is higher.

除此之外,每个节点运行一个节点追踪器(英文:node tracker)实例,负责周期性的向E-Expert报告Task的运行情况,并更新经验模块样本库中的统计信息。In addition, each node runs a node tracker (English: node tracker) instance, which is responsible for periodically reporting the running status of the Task to E-Expert, and updating the statistical information in the experience module sample library.

需要说明的是,在图2中,相同的填充用于表征需要相同的资源类型,如CPU或者内存等,本发明实施例对图2中各个填充表征的资源类型不作具体限定。It should be noted that in FIG. 2 , the same padding is used to represent the same resource type, such as CPU or memory, and the embodiment of the present invention does not specifically limit the resource type represented by each padding in FIG. 2 .

由于该方案考虑了每个任务的运行时间的因素,并且打包模块将空间需求(即Task的资源需求量)和时间需求(即Task的运行时间)固定的Task调度到对应的节点上时,能够根据相应的调度策略灵活选择资源利用率优先策略和效率优先策略来进行资源分配,使得最终采用资源利用率较高和/或效率较高的分配位形,因此能够减轻现有技术中资源碎片的问题,显著提高集群资源的利用率,和/或,可以缩短作业执行时间,提升作业的执行效率。Since this scheme takes into account the running time factor of each task, and when the packaging module schedules tasks with fixed space requirements (that is, the resource requirements of the Task) and time requirements (that is, the running time of the Task) to the corresponding nodes, it can According to the corresponding scheduling strategy, the resource utilization priority strategy and the efficiency priority strategy are flexibly selected for resource allocation, so that the allocation configuration with higher resource utilization rate and/or higher efficiency is finally adopted, so the problem of resource fragmentation in the prior art can be alleviated. problems, significantly improve the utilization of cluster resources, and/or, shorten job execution time, and improve job execution efficiency.

下面将基于图4所示的资源的分配方法的原理示意图,对本发明实施例中的技术方案进行清楚、完整地描述。The following will clearly and completely describe the technical solution in the embodiment of the present invention based on the principle schematic diagram of the resource allocation method shown in FIG. 4 .

如图5所示,本发明实施例提供一种资源分配方法,包括步骤S501-S504:As shown in Figure 5, an embodiment of the present invention provides a resource allocation method, including steps S501-S504:

S501、资源管理器接收客户端设备提交的作业,并将该作业分解为多个任务,其中,该多个任务中的每个任务均配置有相对应的资源需求量。S501. The resource manager receives a job submitted by a client device, and decomposes the job into multiple tasks, where each task in the multiple tasks is configured with a corresponding resource requirement.

S502、资源管理器估计每个任务的运行时间。S502. The resource manager estimates the running time of each task.

S503、资源管理器根据每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形,该第一分配位形用于指示该多个任务在多个计算节点中的可运行计算节点上的分布情况,该调度策略包括资源利用率优先策略和效率优先策略中的至少一种。S503. The resource manager determines the first allocation configuration of the multiple tasks according to the resource demand and running time corresponding to each task, in combination with the preset scheduling policy, and the first allocation configuration is used to indicate the multiple tasks. For the distribution of the executable computing nodes among the plurality of computing nodes, the scheduling strategy includes at least one of resource utilization priority strategy and efficiency priority strategy.

S504、资源管理器将该多个任务按照第一分配位形分配到该多个任务的可运行计算节点上。S504. The resource manager allocates the multiple tasks to computing nodes that can run the multiple tasks according to the first allocation configuration.

具体的,本发明实施例步骤S501中:Specifically, in step S501 of the embodiment of the present invention:

资源需求量具体可以是CPU资源的需求量,和/或,内存资源的需求量,和/或,网络带宽的需求量,等等,本发明实施例对此不作具体限定。The resource demand may specifically be the demand of CPU resources, and/or the demand of memory resources, and/or the demand of network bandwidth, etc., which are not specifically limited in this embodiment of the present invention.

具体的,本发明实施例步骤S502中:Specifically, in step S502 of the embodiment of the present invention:

一般来说,一个任务的具体执行时间外界是无法得知的。但是,在绝大多数情况下,一类任务会重复执行。如在企业客户场景中,可能需要每天执行重复的数据统计工作。因此,基于历史信息的统计往往能够给出一类任务运行时间的估计。为此,资源管理器中可能需要一个模块维护一个统计信息库,用于记录集群历史作业信息,比如图4中的E-Expert模块。当有新的任务到来时,根据历史统计信息将该任务匹配到某一类,然后根据该类任务的运行历史估计出该任务的运行时间。Generally speaking, the specific execution time of a task cannot be known to the outside world. However, in the vast majority of cases, one type of task is performed repeatedly. For example, in an enterprise customer scenario, it may be necessary to perform repetitive data statistics work every day. Therefore, statistics based on historical information can often give an estimate of the running time of a class of tasks. For this reason, a module in the resource manager may be required to maintain a statistical information library for recording historical job information of the cluster, such as the E-Expert module in Figure 4. When a new task arrives, the task is matched to a certain category according to the historical statistical information, and then the running time of the task is estimated according to the running history of this type of task.

E-Expert模块通常将信息分为硬信息与软信息两类。The E-Expert module usually divides information into two categories: hard information and soft information.

硬信息包括作业类型、执行用户等。不同作业类型的任务显然不属于同一类。而同一用户运行的作业有很大可能是同一类,甚至是重复执行的作业。硬信息由下述的样本库维护。Hard information includes job type, executing user, etc. Tasks of different job types obviously do not belong to the same class. However, the jobs run by the same user are likely to be of the same type, or even repeated jobs. Hard information is maintained by the sample repository described below.

软信息包括该任务处理的数据量的大小、输入数据大小、输出数据大小等。这类信息往往不是固定的,但是运行时间与这类信息之间存在着密切的关联。软信息需要做额外的统计,统计信息由下述的统计库维护。Soft information includes the size of the amount of data processed by the task, the size of input data, the size of output data, and the like. This kind of information is often not fixed, but there is a close correlation between running time and this kind of information. Soft information requires additional statistics, and the statistical information is maintained by the statistical library described below.

进而,可选的,资源管理器可以通过如下方式估计每个任务的运行时间,具体包括:Further, optionally, the resource manager may estimate the running time of each task in the following ways, specifically including:

针对每个任务,均按照下面针对第一任务的操作进行处理:For each task, proceed as follows for the first task:

将第一任务的硬信息与样本库中的历史任务的硬信息进行匹配。Match the hard information of the first task with the hard information of the historical tasks in the sample library.

若匹配成功,根据与该第一任务的硬信息匹配的历史任务的历史运行时间估计该第一任务的运行时间。If the matching is successful, the running time of the first task is estimated according to the historical running time of the historical task matching the hard information of the first task.

需要说明的是,当有新的任务到来时,若根据历史统计信息无法将该任务匹配到某一类,则可以通过给该任务赋予全局平局值的方式估计该任务的运行时间,该全局平局值可以为所有历史任务的运行时间的平均值,本发明实施例对该情况不作具体限定。It should be noted that when a new task arrives, if the task cannot be matched to a certain category according to the historical statistical information, the running time of the task can be estimated by assigning a global tie value to the task. The value may be the average running time of all historical tasks, which is not specifically limited in this embodiment of the present invention.

需要说明的是,本发明实施例仅是示例性的给出一种估计任务运行时间的具体实现,当然,还可以通过其它方式估计任务的运行时间,比如,通过预运行任务的方式。即,通过预先运行一小段作业实例获得准确的完整运行时间的估计。另外,同一作业的后续任务的运行时间参考已运行任务的运行时间也会获得更为准确的估计。本发明实施例对估计任务运行时间的具体实现方式不做限定。It should be noted that the embodiment of the present invention is only an example of a specific implementation of estimating the running time of a task. Of course, the running time of a task may also be estimated in other ways, for example, by pre-running the task. That is, by pre-running a small number of job instances to obtain an accurate estimate of the full runtime. Also, the run times of subsequent tasks of the same job will be more accurately estimated by reference to the run times of tasks that have already run. The embodiment of the present invention does not limit the specific implementation manner of estimating the running time of the task.

具体的,本发明实施例步骤S503中:Specifically, in step S503 of the embodiment of the present invention:

调度策略具体可以包括资源利用率优先策略和效率优先策略中的至少一种。其中,The scheduling strategy may specifically include at least one of a resource utilization priority strategy and an efficiency priority strategy. in,

若调度策略为资源利用率优先策略,则第一分配位形具体可以为使得多个任务的可运行计算节点中的每个计算节点的单节点资源利用率最大的分配位形。If the scheduling strategy is a resource utilization priority strategy, the first allocation configuration may specifically be an allocation configuration that maximizes the single-node resource utilization of each computing node among the executable computing nodes of multiple tasks.

或者,若调度策略为效率优先策略,则第一分配位形具体可以为使得作业的整体执行速度最快的分配位形。Alternatively, if the scheduling strategy is an efficiency-first strategy, the first allocation configuration may specifically be an allocation configuration that enables the overall execution speed of the job to be the fastest.

本发明实施例对该第一分配位形的具体形式不作具体限定。The embodiment of the present invention does not specifically limit the specific form of the first allocation configuration.

基于本发明实施例提供的资源分配方法,该资源分配方法中,在接收客户端设备提交的作业,并将该作业分解为多个有相应的资源需求量配置的任务之后,还估计每个任务的运行时间,并根据每个任务的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形,然后将该多个任务按照该第一分配位形分配到该多个任务的可运行计算节点上。其中,该第一分配位形用于指示该多个任务在该多个任务的可运行计算节点上的分布情况,调度策略包括资源利用率优先策略和效率优先策略中的至少一种。也就是说,该方案考虑了每个任务的运行时间的因素,并且将空间需求(即任务的资源需求量)和时间需求(即任务的时间)固定的Task调度到对应的节点上时,能够根据相应的调度策略灵活选择资源利用率优先策略和效率优先策略来进行资源分配,使得最终采用资源利用率较高和/或效率较高的分配位形。一方面,由于可以采用资源利用率较高的分配位形,即可以通过调度策略使得那些节点资源利用率较高的Task组合被调度到节点上,因此该分配方案能有效减轻现有技术中资源碎片的问题,从而提升集群的资源利用率。另一方面,由于可以采用效率较高的分配位形,即可以通过调度策略使得那些作业执行时间最短的Task组合被调度到节点上,因此与现有技术相比,该分配方案能显著缩短作业执行时间,提升作业执行效率。综上,本发明实施例提供的资源分配方法能够根据相应的调度策略灵活选择资源利用率优先策略和效率优先策略来进行资源分配,从而可以提高资源利用率,和/或,可以提升用户作业的执行效率。Based on the resource allocation method provided by the embodiment of the present invention, in the resource allocation method, after receiving the job submitted by the client device and decomposing the job into a plurality of tasks with corresponding resource requirement configurations, each task is further estimated The running time of each task, and according to the resource requirements and running time of each task, combined with the preset scheduling strategy, determine the first allocation configuration of the multiple tasks, and then allocate the multiple tasks according to the first allocation configuration to the runnable compute nodes for the multiple tasks. Wherein, the first allocation configuration is used to indicate the distribution of the multiple tasks on the operational computing nodes of the multiple tasks, and the scheduling strategy includes at least one of resource utilization priority strategy and efficiency priority strategy. That is to say, the scheme takes into account the running time of each task, and when scheduling tasks with fixed space requirements (that is, the resource requirements of the task) and time requirements (that is, the time of the task) to the corresponding nodes, it can According to the corresponding scheduling policy, the resource utilization priority strategy and the efficiency priority strategy are flexibly selected for resource allocation, so that an allocation configuration with higher resource utilization rate and/or higher efficiency is finally adopted. On the one hand, since the allocation configuration with high resource utilization can be adopted, that is, the task combination with high resource utilization of those nodes can be scheduled to the nodes through the scheduling strategy, so this allocation scheme can effectively reduce the resource utilization in the prior art. The problem of fragmentation, thereby improving the resource utilization of the cluster. On the other hand, since a more efficient allocation configuration can be adopted, that is, the task combination with the shortest execution time of the job can be scheduled to the node through the scheduling strategy, so compared with the existing technology, this allocation scheme can significantly shorten the job Execution time, improve job execution efficiency. To sum up, the resource allocation method provided by the embodiment of the present invention can flexibly select the resource utilization rate priority policy and the efficiency priority policy for resource allocation according to the corresponding scheduling policy, thereby improving the resource utilization rate, and/or, improving the efficiency of user jobs. effectiveness.

可选的,本发明实施例提供的资源分配方法中,还可以同时考虑对于异构集群和特殊资源需求作业的资源分配。Optionally, in the resource allocation method provided by the embodiment of the present invention, resource allocation for heterogeneous clusters and jobs with special resource requirements can also be considered at the same time.

即,如图6所示,在资源管理器根据每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形(步骤S503)之前,还可以包括步骤S505:That is, as shown in FIG. 6, before the resource manager determines the first allocation configuration of the plurality of tasks according to the resource demand and running time corresponding to each task in combination with the preset scheduling strategy (step S503), May include step S505:

S505、资源管理器将该多个任务按照资源的种类进行分类,获得至少一类任务。S505. The resource manager classifies the multiple tasks according to resource types to obtain at least one type of task.

其中,资源种类具体可以包括异构资源种类和非异构资源种类等,异构资源种类也可以根据是何种异构资源进行再次划分,本发明实施例对此不作具体限定。The resource types may specifically include heterogeneous resource types and non-heterogeneous resource types, and the heterogeneous resource types may also be subdivided according to the type of heterogeneous resources, which is not specifically limited in this embodiment of the present invention.

进而,资源管理器根据每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形(步骤S503),具体可以包括步骤S503a和S503b:Furthermore, the resource manager determines the first allocation configuration of the multiple tasks according to the resource demand and running time corresponding to each task, in combination with the preset scheduling strategy (step S503), which may specifically include steps S503a and S503b:

S503a、针对该至少一类任务中的每类任务,资源管理器均按照下面针对第一类任务的操作进行处理:S503a. For each type of task in the at least one type of task, the resource manager performs the following operations for the first type of task:

根据该第一类任务中的每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定该第一类任务的子分配位形,该子分配位形用于指示该第一类任务在多个计算节点中的可运行计算节点上的分布情况。According to the resource demand and running time corresponding to each task in the first type of task, combined with the preset scheduling strategy, determine the sub-allocation configuration of the first type of task, and the sub-allocation configuration is used to indicate the first The distribution of class tasks on runnable compute nodes among multiple compute nodes.

S503b、资源管理器将该至少一类任务中的每类任务的子分配位形的组合确定为该多个任务的第一分配位形。S503b. The resource manager determines a combination of sub-allocation configurations of each type of task in the at least one type of task as a first allocation configuration of the plurality of tasks.

由于本发明实施例提供的资源分配方法首先将多个任务按照资源的种类进行分类,进而对于每一类任务分别进行资源分配,也就是说可以同时考虑对于异构集群和特殊资源需求作业的资源分配,因而具有更广泛的普适性和更好的综合表现。Since the resource allocation method provided by the embodiment of the present invention first classifies multiple tasks according to the types of resources, and then allocates resources for each type of task separately, that is to say, resources for heterogeneous clusters and special resource requirements can be considered at the same time. Therefore, it has wider applicability and better comprehensive performance.

可选的,考虑到运行时间的估计会与实际情况通常会产生一些偏差。如果对这些偏差不做控制,则随着时间的延长,作业资源的预分配结果与理想结果可能相差愈来愈大。因此,本发明实施例提供的资源分配方法中,还可以引入变异机制(即重新分配)。Optionally, take into account that the running time estimates will usually have some deviation from the actual situation. If these deviations are not controlled, as time goes on, the pre-allocation results of job resources may be more and more different from the ideal results. Therefore, in the resource allocation method provided by the embodiment of the present invention, a mutation mechanism (that is, reallocation) may also be introduced.

即,如图7所示,在资源管理器将该多个任务按照第一分配位形分配到该多个任务的可运行计算节点上(步骤S504)之后,还可以包括步骤S506-S509:That is, as shown in FIG. 7, after the resource manager allocates the multiple tasks to the runnable computing nodes of the multiple tasks according to the first allocation configuration (step S504), steps S506-S509 may also be included:

S506、资源管理器根据该第一分配位形,确定所有处于等待状态的任务运行在所分配的节点上时的第一整体分配目标函数值。S506. According to the first allocation configuration, the resource manager determines a first overall allocation objective function value when all tasks in the waiting state run on the allocated nodes.

S507、资源管理器根据该所有处于等待状态的任务对应的资源需求量和运行时间,结合预设的调度策略,确定所有处于等待状态的任务的第二分配位形,该第二分配位形用于指示该所有处于等待状态的任务在所有处于等待状态的任务的可运行计算节点上的分布情况。S507. The resource manager determines the second allocation configuration of all tasks in the waiting state according to the resource requirements and running time corresponding to the tasks in the waiting state, and in combination with the preset scheduling policy. The second allocation configuration uses Indicates the distribution of all tasks in the waiting state on all computing nodes that are able to run the tasks in the waiting state.

S508、资源管理器根据该第二分配位形,确定该所有处于等待状态的任务运行在所分配的节点上时的第二整体分配目标函数值。S508. According to the second allocation configuration, the resource manager determines a second overall allocation objective function value when all the tasks in the waiting state run on the allocated nodes.

S509、若第二整体分配目标函数值大于第一整体分配目标函数值,资源管理器将该所有处于等待状态的任务按照第二分配位形分配到该所有处于等待状态的任务的可运行计算节点上。S509. If the value of the second overall allocation objective function is greater than the value of the first overall allocation objective function, the resource manager allocates all the tasks in the waiting state to the runnable computing nodes of all the tasks in the waiting state according to the second allocation configuration superior.

可选的,本发明实施例中,整体分配目标函数值可通过如下公式(1)获得:Optionally, in the embodiment of the present invention, the overall allocation objective function value can be obtained by the following formula (1):

S=∑nSn 公式(1)S = ∑ n S n formula (1)

其中,Sn表示单节点n的分配目标函数值;S表示整体分配目标函数值。Among them, S n represents the distribution objective function value of single node n; S represents the overall distribution objective function value.

即,步骤S506中第一整体分配目标函数值等于第一分配位形时,所有处于等待状态的任务运行在所分配的节点上时各个节点的单节点分配目标函数值的和。That is, when the first overall allocation objective function value in step S506 is equal to the first allocation configuration, the sum of the single-node allocation objective function values of each node when all tasks in the waiting state run on the allocated nodes.

步骤S508中第二整体分配目标函数值等于第二分配位形时,所有处于等待状态的任务运行在所分配的节点上时各个节点的单节点分配目标函数值的和。In step S508, when the second overall allocation objective function value is equal to the second allocation configuration, the sum of the single-node allocation objective function values of each node when all tasks in the waiting state run on the allocated nodes.

可选的,本发明实施例中,单节点分配目标函数具体可以如公式(2)所示:Optionally, in this embodiment of the present invention, the single-node allocation objective function may be specifically shown in formula (2):

公式(2) Formula (2)

其中,Sn表示单节点n的分配目标函数值;p表示作业时间优先级因子,p>0;f表示作业公平性因子,f>0;p+f≤1;m表示作业的数量。Among them, S n represents the distribution objective function value of single node n; p represents the job time priority factor, p>0; f represents the job fairness factor, f>0;p+f≤1; m represents the number of jobs.

Se,n表示节点n上的资源利用率得分,rn表示节点n的资源,rt表示任务t的资源需求量。Se ,n represents the resource utilization score on node n, r n represents the resource of node n, r t represents the resource requirement of task t.

Sp,j表示作业j的执行进度,Tj表示作业j还需要多少时间执行完毕,该值可由历史统计数据得出,T0表示作业j的总体运行时间。可以看出,Sp,j的取值范围为[1/e,1],1/e表示作业刚开始运行,1表示作业已完成。S p, j represents the execution progress of job j, T j indicates how much time the job j needs to complete, and this value can be obtained from historical statistical data. T 0 indicates the overall running time of job j. It can be seen that the value range of S p, j is [1/e, 1], 1/e means that the job has just started running, and 1 means that the job has been completed.

Sf,j表示作业j的公平性得分,rj表示作业j的资源需求,rf表示作业j在完全公平情况下的应得资源。S f,j represents the fairness score of job j, r j represents the resource requirement of job j, and r f represents the due resource of job j under the condition of complete fairness.

需要说明的是,在多维资源的情况下,上述公式中的rn、rt、rf等参数均为矢量。It should be noted that in the case of multi-dimensional resources, parameters such as r n , r t , and r f in the above formula are all vectors.

在上述公式(2)中,考虑了节点的资源利用率、作业的公平性以及作业的执行进度。其中,当f=0,p=0时,完全考虑资源利用率,即资源管理器按照整体资源利用率最高的原则分配资源;当f=1,p=0时,完全考虑公平,即资源管理器会在不同的作业间公平地分配资源;当f=1,p=1时,则完全考虑时间优先级,即资源管理器优先分配资源给那些更快完成的作业。当然,f和p还可以为其它数值,用户可以根据作业的运行需求进行设置,使得分配在最优资源利用率和最优作业执行时间取得平衡,本发明实施例对此不作具体限定。In the above formula (2), resource utilization of nodes, job fairness and job execution progress are taken into consideration. Among them, when f=0, p=0, resource utilization is fully considered, that is, the resource manager allocates resources according to the principle of the highest overall resource utilization; when f=1, p=0, fairness is fully considered, that is, resource management The controller will allocate resources among different jobs fairly; when f=1, p=1, time priority is fully considered, that is, the resource manager assigns resources to jobs that finish faster. Of course, f and p can also be other values, and the user can set them according to the operation requirements of the job, so that the allocation can be balanced between the optimal resource utilization rate and the optimal job execution time, which is not specifically limited in the embodiment of the present invention.

本发明实施例对此不作具体限定。This embodiment of the present invention does not specifically limit it.

需要说明的是,公式(2)仅是示例性的给出一种单节点分配目标函数的具体实现,当然,根据不同的分配考虑因素,该单节点分配目标函数还可以为其它,本发明实施例对此不作具体限定。It should be noted that the formula (2) is only an example of a specific implementation of a single-node allocation objective function, of course, according to different allocation considerations, the single-node allocation objective function can also be other, the implementation of the present invention The example does not specifically limit this.

通过上述变异机制,可以使得作业资源的预分配结果向更好的方向进化。其中,图8示意性的给出了一种作业资源的分配结果的变异机制示意图,可以看出,随着时间的延长,作业资源的预分配结果与理想结果可能相差愈来愈大。经过上述变异机制,可以将节点3上的任务1调整到节点1上,将节点2上的任务2调整到节点3上,将节点1上的任务3调整到节点2上,从而使得作业资源的预分配结果向更好的方向进化。Through the above mutation mechanism, the pre-allocation results of job resources can be made to evolve in a better direction. Among them, Fig. 8 schematically shows a schematic diagram of a variation mechanism of job resource allocation results. It can be seen that as time goes on, the difference between the pre-allocation results of job resources and the ideal results may become larger and larger. Through the above mutation mechanism, task 1 on node 3 can be adjusted to node 1, task 2 on node 2 can be adjusted to node 3, and task 3 on node 1 can be adjusted to node 2, so that the job resource The pre-allocation results are evolving in a better direction.

需要说明的是,在图8中,相同的填充用于表征需要相同的资源类型,如CPU或者内存等,本发明实施例对图8中各个填充表征的资源类型不作具体限定。It should be noted that in FIG. 8 , the same padding is used to represent the same resource type, such as CPU or memory, and the embodiment of the present invention does not specifically limit the resource type represented by each padding in FIG. 8 .

下面将以结合一个具体示例对上述各实施例中的资源分配方法进行说明。The resource allocation methods in the foregoing embodiments will be described below with reference to a specific example.

示例性的,假设有四个节点node1,node2,node3和node4,其中,node1,node2和node3为同构节点,各有6个核,12G内存,2Gbps的网络带宽;node4是个异构的图形计算节点,有2个核,2G内存和一个128核的图形处理器(英文全称:graphics processing unit,英文缩写:GPU)显卡。For example, suppose there are four nodes node1, node2, node3 and node4, among which node1, node2 and node3 are homogeneous nodes, each with 6 cores, 12G memory, and 2Gbps network bandwidth; node4 is a heterogeneous graph computing The node has 2 cores, 2G memory and a 128-core graphics processor (English full name: graphics processing unit, English abbreviation: GPU) graphics card.

另外,假设有四个Job,每个Job提交的Task的资源需求量如下所示。其中,括号中的三维数字分别代表Task所需的核的数量,内存的大小和网络带宽的大小:In addition, assuming that there are four jobs, the resource requirements of the tasks submitted by each job are as follows. Among them, the three-dimensional numbers in parentheses represent the number of cores required by the Task, the size of the memory, and the size of the network bandwidth:

JobA:18个Map Task(1,2,0),3个Reduce Task(0,0,2);JobA: 18 Map Tasks (1, 2, 0), 3 Reduce Tasks (0, 0, 2);

JobB:6个Map Task(3,1,0),3个Reduce Task(0,0,2);JobB: 6 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobC:6个Map Task(3,1,0),3个Reduce Task(0,0,2);JobC: 6 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobD:2个Map Task(1,1,0),需要GPU;JobD: 2 Map Tasks (1, 1, 0), requiring GPU;

同时,假设所有JobA,JobB和JobC的Task的运行时间估计均为t,而JobD的Task的运行时间估计为4t。At the same time, it is assumed that the running time of all JobA, JobB and JobC's Tasks is estimated to be t, while the running time of JobD's Task is estimated to be 4t.

情况一:Case 1:

此时,若采用资源利用率优先的原则,比如整体资源利用率最高,也就是上述单节点分配目标函数(公式(2))中的f=0,p=0,则JobA,JobB,JobC和JobD的Task将按照如下流程调度:At this time, if the principle of resource utilization priority is adopted, for example, the overall resource utilization rate is the highest, that is, f=0 and p=0 in the above-mentioned single-node allocation objective function (formula (2)), then JobA, JobB, JobC and JobD's Task will be scheduled according to the following process:

步骤1、将所有Task按照是否具有异构资源需求进行分类。Step 1. Classify all tasks according to whether they have heterogeneous resource requirements.

此时,JobA,JobB,JobC和JobD的所有的Task可以分为两类:需要GPU的和不需要GPU的。At this point, all tasks of JobA, JobB, JobC, and JobD can be divided into two categories: those that require GPU and those that do not.

步骤2、对于需要GPU的2个Map Task,将其调度到节点node4。Step 2. For the two Map Tasks that require GPU, schedule them to node node4.

经过此轮分配,处于等待状态的Task的状况是:After this round of allocation, the status of the Task in the waiting state is:

JobA:18个Map Task(1,2,0),3个Reduce Task(0,0,2);JobA: 18 Map Tasks (1, 2, 0), 3 Reduce Tasks (0, 0, 2);

JobB:6个Map Task(3,1,0),3个Reduce Task(0,0,2);JobB: 6 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobC:6个Map Task(3,1,0),3个Reduce Task(0,0,2);JobC: 6 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobD:0个Map Task(1,1,0),需要GPU。JobD: 0 Map Task (1, 1, 0), requires GPU.

步骤3、对于剩余的Task,现在的集群还剩余node1,node2和node3三个节点,总共(18,36,6)资源。Step 3. For the remaining Tasks, the current cluster still has three nodes, node1, node2 and node3, with a total of (18, 36, 6) resources.

步骤4、对于node1节点,经计算,JobA中的6个Map Task形成的Tasks包(总共(6,12,0)资源)使得node1节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node1上。Step 4. For the node1 node, after calculation, the Tasks package (a total of (6, 12, 0) resources) formed by the 6 Map Tasks in JobA makes the single-node allocation objective function value S n of the node1 node the largest. Therefore, the The Tasks package is placed on node1.

对于node2和node3节点,经计算,JobA中的6个Map Task形成的Tasks包(总共(6,12,0)资源)使得node2节点的单节点分配目标函数值Sn最大,并且JobA中的6个Map Task形成的Tasks包(总共(6,12,0)资源)使得node3节点的单节点分配目标函数值Sn最大。因此,将JobA中的剩余12个Map Task平均分配到node2和node3节点上。For node2 and node3 nodes, after calculation, the Tasks package formed by the 6 Map Tasks in JobA (a total of (6, 12, 0) resources) makes the single-node allocation objective function value S n of the node2 node the largest, and the 6 Map Tasks in JobA The Tasks package (a total of (6, 12, 0) resources) formed by four Map Tasks makes the single-node allocation objective function value S n of node3 the largest. Therefore, the remaining 12 Map Tasks in JobA are evenly distributed to node2 and node3 nodes.

需要说明的是,由于该示例中假设所有JobA,JobB和JobC的Task的运行时间估计均为t,也就是运行时间均相等,并且f=0,p=0,因此单节点分配目标函数退化为进而遍历各种组合,最终可以确定出JobA中的6个Map Task形成的Tasks包(总共(6,12,0)资源)使得node1节点的单节点分配目标函数值Sn最大,此时, It should be noted that since this example assumes that the running time estimates of all JobA, JobB, and JobC tasks are t, that is, the running time is equal, and f=0, p=0, the single-node allocation objective function degenerate into Then traverse various combinations, and finally determine that the Tasks package formed by the 6 Map Tasks in JobA (a total of (6, 12, 0) resources) makes the single-node allocation objective function value S n of the node1 node the largest. At this time,

相比之下,若取JobB两个Map Task形成Tasks包,则单节点分配目标函数值Sn小于上述值,此处就不再一一验证。In contrast, if the two Map Tasks of JobB are taken to form a Tasks package, then the value of the single-node assignment objective function S n is less than the above value, so we will not verify them one by one here.

同理,根据上述退化公式,可以确定JobA中的6个Map Task形成的Tasks包(总共(6,12,0)资源)使得node2节点的单节点分配目标函数值Sn最大,JobA中的6个Map Task形成的Tasks包(总共(6,12,0)资源)使得node3节点的单节点分配目标函数值Sn最大,本发明实施例在此就不再一一验证。Similarly, according to the above degradation formula, it can be determined that the Tasks package formed by the 6 Map Tasks in JobA (a total of (6, 12, 0) resources) makes the single-node allocation objective function value S n of node2 the largest, and the 6 Map Tasks in JobA The Tasks package (a total of (6, 12, 0) resources) formed by four Map Tasks makes the single-node allocation objective function value S n of the node3 node the largest, and the embodiment of the present invention will not verify them one by one here.

经过此轮分配,处于等待状态的Task的状况是:After this round of allocation, the status of the Task in the waiting state is:

JobA:0个Map Task(1,2,0),3个Reduce Task(0,0,2);JobA: 0 Map Tasks (1, 2, 0), 3 Reduce Tasks (0, 0, 2);

JobB:6个Map Task(3,1,0),3个Reduce Task(0,0,2);JobB: 6 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobC:6个Map Task(3,1,0),3个Reduce Task(0,0,2);JobC: 6 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobD:0个Map Task(1,1,0),需要GPU。JobD: 0 Map Task (1, 1, 0), requires GPU.

步骤5、继续分配:Step 5. Continue to distribute:

经计算,JobA的1个Reduce Task和JobB的2个Map Task形成Tasks包(总共(6,2,2)资源)使得node1节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node1上。After calculation, 1 Reduce Task of JobA and 2 Map Tasks of JobB form a Tasks package (a total of (6, 2, 2) resources) so that the single-node allocation objective function value S n of the node1 node is the largest. Therefore, the Tasks package placed on node1.

经计算,JobA的1个Reduce Task和JobB的2个Map Task形成Tasks包(总共(6,2,2)资源)使得node2节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node2上。After calculation, 1 Reduce Task of JobA and 2 Map Tasks of JobB form a Tasks package (a total of (6, 2, 2) resources) so that the single-node allocation objective function value S n of the node2 node is the largest. Therefore, the Tasks package placed on node2.

经计算,JobA的1个Reduce Task和JobB的2个Map Task形成Tasks包(总共(6,2,2)资源)使得node3节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node3上。After calculation, 1 Reduce Task of JobA and 2 Map Tasks of JobB form a Tasks package (a total of (6, 2, 2) resources) so that the single-node allocation objective function value S n of the node3 node is the largest. Therefore, the Tasks package placed on node3.

这样经过此轮分配后,处于等待状态的Task的状况是:In this way, after this round of allocation, the status of the Task in the waiting state is:

JobA:0个Map Task(1,2,0),0个Reduce Task(0,0,2);JobA: 0 Map Task (1, 2, 0), 0 Reduce Task (0, 0, 2);

JobB:0个Map Task(3,1,0),3个Reduce Task(0,0,2);JobB: 0 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobC:6个Map Task(3,1,0),3个Reduce Task(0,0,2);JobC: 6 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobD:0个Map Task(1,1,0)需要GPU。JobD: 0 Map Task (1, 1, 0) requires GPU.

步骤6、继续分配:Step 6. Continue to distribute:

经计算,JobB的1个Reduce Task和JobC的2个Map Task形成Tasks包(总共(6,2,2)资源)使得node1节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node1上。After calculation, 1 Reduce Task of JobB and 2 Map Tasks of JobC form a Tasks package (a total of (6, 2, 2) resources) so that the single-node allocation objective function value S n of the node1 node is the largest. Therefore, the Tasks package placed on node1.

经计算,JobB的1个Reduce Task和JobC的2个Map Task形成Tasks包(总共(6,2,2)资源)使得node2节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node2上。After calculation, 1 Reduce Task of JobB and 2 Map Tasks of JobC form a Tasks package (a total of (6, 2, 2) resources) so that the single-node allocation objective function value S n of the node2 node is the largest. Therefore, the Tasks package placed on node2.

经计算,JobB的1个Reduce Task和JobC的2个Map Task形成Tasks包(总共(6,2,2)资源)使得node3节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node3上。After calculation, 1 Reduce Task of JobB and 2 Map Tasks of JobC form a Tasks package (a total of (6, 2, 2) resources) so that the single-node allocation objective function value S n of the node3 node is the largest. Therefore, the Tasks package placed on node3.

这样经过此轮分配后,处于等待状态的Task的状况是:In this way, after this round of allocation, the status of the Task in the waiting state is:

JobA:0个Map Task(1,2,0),0个Reduce Task(0,0,2);JobA: 0 Map Task (1, 2, 0), 0 Reduce Task (0, 0, 2);

JobB:0个Map Task(3,1,0),0个Reduce Task(0,0,2);JobB: 0 Map Task (3, 1, 0), 0 Reduce Task (0, 0, 2);

JobC:0个Map Task(3,1,0),3个Reduce Task(0,0,2);JobC: 0 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobD:0个Map Task(1,1,0),需要GPU。JobD: 0 Map Task (1, 1, 0), requires GPU.

步骤7、继续分配:Step 7. Continue to distribute:

将JobC剩余的3个Task分配在三个节点上,这样,所有分配进行完毕,最终的目标分配位形如图9所示。Assign the remaining 3 Tasks of JobC to the three nodes. In this way, all assignments are completed, and the final target assignment configuration is shown in Figure 9.

情况二:Case two:

此时,若采用公平优先的原则,比如上述单节点分配目标函数(公式(2))中的f=1,p=0,则JobA,JobB,JobC和JobD的Task将按照如下流程调度:At this time, if the principle of fairness and priority is adopted, for example, f=1 and p=0 in the above-mentioned single-node allocation objective function (formula (2)), the tasks of JobA, JobB, JobC and JobD will be scheduled according to the following process:

步骤1、将所有Task按照是否具有异构资源需求进行分类。Step 1. Classify all tasks according to whether they have heterogeneous resource requirements.

此时,JobA,JobB,JobC和JobD的所有的Task可以分为两类:需要GPU的和不需要GPU的。At this point, all tasks of JobA, JobB, JobC, and JobD can be divided into two categories: those that require GPU and those that do not.

步骤2、对于需要GPU的2个Map Task,将其调度到节点node4。Step 2. For the two Map Tasks that require GPU, schedule them to node node4.

经过此轮分配,处于等待状态的Task的状况是:After this round of allocation, the status of the Task in the waiting state is:

JobA:18个Map Task(1,2,0),3个Reduce Task(0,0,2);JobA: 18 Map Tasks (1, 2, 0), 3 Reduce Tasks (0, 0, 2);

JobB:6个Map Task(3,1,0),3个Reduce Task(0,0,2);JobB: 6 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobC:6个Map Task(3,1,0),3个Reduce Task(0,0,2);JobC: 6 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobD:0个Map Task(1,1,0),需要GPU。JobD: 0 Map Task (1, 1, 0), requires GPU.

步骤3、对于剩余的Task,现在的集群还剩余node1,node2和node3三个节点,总共(18,36,6)资源。Step 3. For the remaining Tasks, the current cluster still has three nodes, node1, node2 and node3, with a total of (18, 36, 6) resources.

步骤4、对于node1节点,经计算,JobA中的6个Map Task形成的Tasks包(总共(6,12,0)资源)使得node1节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node1上。Step 4. For the node1 node, after calculation, the Tasks package (a total of (6, 12, 0) resources) formed by the 6 Map Tasks in JobA makes the single-node allocation objective function value S n of the node1 node the largest. Therefore, the The Tasks package is placed on node1.

对于node2节点,经计算,JobB中的2个Map Task形成的Tasks包(总共(6,2,0)资源)使得node2节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node2上。For the node2 node, after calculation, the Tasks package (a total of (6, 2, 0) resources) formed by the two Map Tasks in JobB makes the single-node allocation objective function value S n of the node2 node the largest, so the Tasks package is placed in on node2.

对于node3节点,经计算,JobC中的2个Map Task形成的Tasks包(总共(6,2,0)资源)使得node3节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node3上。For the node3 node, after calculation, the Tasks package formed by the two Map Tasks in JobC (a total of (6, 2, 0) resources) makes the single-node allocation objective function value S n of the node3 node the largest, so the Tasks package is placed in on node3.

需要说明的是,由于该示例中假设所有JobA,JobB和JobC的Task的运行时间估计均为t,也就是运行时间均相等,并且f=1,p=0,因此单节点分配目标函数退化为进而遍历各种组合,最终可以确定出JobA中的6个Map Task形成的Tasks包(总共(6,12,0)资源)使得node1节点的单节点分配目标函数值Sn最大。It should be noted that since this example assumes that the running time of all JobA, JobB, and JobC tasks is estimated to be t, that is, the running time is equal, and f=1, p=0, the single-node allocation objective function degenerate into After traversing various combinations, it can finally be determined that the Tasks package (a total of (6, 12, 0) resources) formed by the 6 Map Tasks in JobA makes the single-node allocation objective function value S n of node1 the largest.

同理,根据上述退化公式,可以确定JobB中的2个Map Task形成的Tasks包(总共(6,2,0)资源)使得node2节点的单节点分配目标函数值Sn最大,JobC中的2个Map Task形成的Tasks包(总共(6,2,0)资源)使得node3节点的单节点分配目标函数值Sn最大,本发明实施例在此就不再一一验证。Similarly, according to the above degradation formula, it can be determined that the Tasks package formed by the two Map Tasks in JobB (a total of (6, 2, 0) resources) makes the single-node allocation objective function value S n of node2 the largest, and the 2 in JobC The Tasks package (a total of (6, 2, 0) resources) formed by four Map Tasks makes the single-node allocation objective function value S n of the node3 node the largest, and the embodiment of the present invention will not verify one by one here.

经过此轮分配,处于等待状态的Task的状况是:After this round of allocation, the status of the Task in the waiting state is:

JobA:12个Map Task(1,2,0),3个Reduce Task(0,0,2);JobA: 12 Map Tasks (1, 2, 0), 3 Reduce Tasks (0, 0, 2);

JobB:4个Map Task(3,1,0),3个Reduce Task(0,0,2);JobB: 4 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobC:4个Map Task(3,1,0),3个Reduce Task(0,0,2);JobC: 4 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobD:0个Map Task(1,1,0),需要GPU。JobD: 0 Map Task (1, 1, 0), requires GPU.

步骤5、继续分配:Step 5. Continue to distribute:

经计算,JobA中的6个Map Task形成的Tasks包(总共(6,12,0)资源)使得node1节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node1上。After calculation, the Tasks package formed by the 6 Map Tasks in JobA (a total of (6, 12, 0) resources) makes the single-node allocation objective function value S n of node1 the largest, so the Tasks package is placed on node1.

经计算,JobB中的2个Map Task形成的Tasks包(总共(6,2,0)资源)使得node2节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node2上。After calculation, the Tasks package (a total of (6, 2, 0) resources) formed by the two Map Tasks in JobB makes the single-node allocation objective function value S n of node2 the largest, so the Tasks package is placed on node2.

经计算,JobC中的2个Map Task形成的Tasks包(总共(6,2,0)资源)使得node3节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node3上。After calculation, the Tasks package formed by the two Map Tasks in JobC (a total of (6, 2, 0) resources) maximizes the single-node allocation objective function value S n of node3, so the Tasks package is placed on node3.

这样经过此轮分配后,处于等待状态的Task的状况是:In this way, after this round of allocation, the status of the Task in the waiting state is:

JobA:6个Map Task(1,2,0),3个Reduce Task(0,0,2);JobA: 6 Map Tasks (1, 2, 0), 3 Reduce Tasks (0, 0, 2);

JobB:2个Map Task(3,1,0),3个Reduce Task(0,0,2);JobB: 2 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobC:2个Map Task(3,1,0),3个Reduce Task(0,0,2);JobC: 2 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobD:0个Map Task(1,1,0),需要GPU。JobD: 0 Map Task (1, 1, 0), requires GPU.

步骤6、继续分配:Step 6. Continue to distribute:

经计算,JobA中的6个Map Task形成的Tasks包(总共(6,12,0)资源)使得node1节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node1上。After calculation, the Tasks package formed by the 6 Map Tasks in JobA (a total of (6, 12, 0) resources) makes the single-node allocation objective function value S n of node1 the largest, so the Tasks package is placed on node1.

经计算,JobB中的2个Map Task形成的Tasks包(总共(6,2,0)资源)使得node2节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node2上。After calculation, the Tasks package (a total of (6, 2, 0) resources) formed by the two Map Tasks in JobB makes the single-node allocation objective function value S n of node2 the largest, so the Tasks package is placed on node2.

经计算,JobC中的2个Map Task形成的Tasks包(总共(6,2,0)资源)使得node3节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node3上。After calculation, the Tasks package formed by the two Map Tasks in JobC (a total of (6, 2, 0) resources) maximizes the single-node allocation objective function value S n of node3, so the Tasks package is placed on node3.

这样经过此轮分配后,处于等待状态的Task的状况是:In this way, after this round of allocation, the status of the Task in the waiting state is:

JobA:0个Map Task(1,2,0),3个Reduce Task(0,0,2);JobA: 0 Map Tasks (1, 2, 0), 3 Reduce Tasks (0, 0, 2);

JobB:0个Map Task(3,1,0),3个Reduce Task(0,0,2);JobB: 0 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobC:0个Map Task(3,1,0),3个Reduce Task(0,0,2);JobC: 0 Map Tasks (3, 1, 0), 3 Reduce Tasks (0, 0, 2);

JobD:0个Map Task(1,1,0),需要GPU。JobD: 0 Map Task (1, 1, 0), requires GPU.

步骤7、继续分配:Step 7. Continue to distribute:

经计算,JobA中的1个Reduce Task形成的Tasks包(总共(0,0,2)资源)使得node1节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node1上。After calculation, the Tasks package formed by one Reduce Task in JobA (a total of (0, 0, 2) resources) makes the single-node allocation objective function value S n of node1 the largest, so the Tasks package is placed on node1.

经计算,JobB中的1个Reduce Task形成的Tasks包(总共(0,0,2)资源)使得node2节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node2上。After calculation, the Tasks package formed by one Reduce Task in JobB (a total of (0, 0, 2) resources) maximizes the single-node allocation objective function value S n of node2, so the Tasks package is placed on node2.

经计算,JobC中的1个Reduce Task形成的Tasks包(总共(0,0,2)资源)使得node3节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node3上。After calculation, the Tasks package formed by one Reduce Task in JobC (a total of (0, 0, 2) resources) makes the single-node allocation objective function value S n of node3 the largest, so the Tasks package is placed on node3.

这样经过此轮分配后,处于等待状态的Task的状况是:In this way, after this round of allocation, the status of the Task in the waiting state is:

JobA:0个Map Task(1,2,0),2个Reduce Task(0,0,2);JobA: 0 Map Tasks (1, 2, 0), 2 Reduce Tasks (0, 0, 2);

JobB:0个Map Task(3,1,0),2个Reduce Task(0,0,2);JobB: 0 Map Tasks (3, 1, 0), 2 Reduce Tasks (0, 0, 2);

JobC:0个Map Task(3,1,0),2个Reduce Task(0,0,2);JobC: 0 Map Tasks (3, 1, 0), 2 Reduce Tasks (0, 0, 2);

JobD:0个Map Task(1,1,0),需要GPU。JobD: 0 Map Task (1, 1, 0), requires GPU.

步骤8、继续分配:Step 8. Continue to assign:

经计算,JobA中的1个Reduce Task形成的Tasks包(总共(0,0,2)资源)使得node1节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node1上。After calculation, the Tasks package formed by one Reduce Task in JobA (a total of (0, 0, 2) resources) makes the single-node allocation objective function value S n of node1 the largest, so the Tasks package is placed on node1.

经计算,JobB中的1个Reduce Task形成的Tasks包(总共(0,0,2)资源)使得node2节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node2上。After calculation, the Tasks package formed by one Reduce Task in JobB (a total of (0, 0, 2) resources) maximizes the single-node allocation objective function value S n of node2, so the Tasks package is placed on node2.

经计算,JobC中的1个Reduce Task形成的Tasks包(总共(0,0,2)资源)使得node3节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node3上。After calculation, the Tasks package formed by one Reduce Task in JobC (a total of (0, 0, 2) resources) makes the single-node allocation objective function value S n of node3 the largest, so the Tasks package is placed on node3.

这样经过此轮分配后,处于等待状态的Task的状况是:In this way, after this round of allocation, the status of the Task in the waiting state is:

JobA:0个Map Task(1,2,0),1个Reduce Task(0,0,2);JobA: 0 Map Task (1, 2, 0), 1 Reduce Task (0, 0, 2);

JobB:0个Map Task(3,1,0),1个Reduce Task(0,0,2);JobB: 0 Map Task (3, 1, 0), 1 Reduce Task (0, 0, 2);

JobC:0个Map Task(3,1,0),1个Reduce Task(0,0,2);JobC: 0 Map Task (3, 1, 0), 1 Reduce Task (0, 0, 2);

JobD:0个Map Task(1,1,0),需要GPU;JobD: 0 Map Task (1, 1, 0), requires GPU;

步骤9、继续分配:Step 9. Continue to assign:

经计算,JobA中的1个Reduce Task形成的Tasks包(总共(0,0,2)资源)使得node1节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node1上。After calculation, the Tasks package formed by one Reduce Task in JobA (a total of (0, 0, 2) resources) makes the single-node allocation objective function value S n of node1 the largest, so the Tasks package is placed on node1.

经计算,JobB中的1个Reduce Task形成的Tasks包(总共(0,0,2)资源)使得node2节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node2上。After calculation, the Tasks package formed by one Reduce Task in JobB (a total of (0, 0, 2) resources) maximizes the single-node allocation objective function value S n of node2, so the Tasks package is placed on node2.

经计算,JobC中的1个Reduce Task形成的Tasks包(总共(0,0,2)资源)使得node3节点的单节点分配目标函数值Sn最大,因此,将该Tasks包放置在node3上。After calculation, the Tasks package formed by one Reduce Task in JobC (a total of (0, 0, 2) resources) makes the single-node allocation objective function value S n of node3 the largest, so the Tasks package is placed on node3.

这样经过此轮分配后,所有分配进行完毕,最终的目标分配位形如图10所示。In this way, after this round of allocation, all allocations are completed, and the final target allocation configuration is shown in FIG. 10 .

情况三:Case three:

此时,若采用时间优先的原则,比如整体执行效率最高,也就是上述单节点分配目标函数(公式(2))中的f=0,p=1,则JobA,JobB,JobC和JobD的Task在进行分配时将完全考虑时间优先级,即资源管理器优先分配资源给那些更快完成的作业。此时,单节点分配目标函数退化为进而根据每个节点的单节点目标函数的值最大的原则进行分配即可,本发明实施例对该情况就不再详细举例说明。At this time, if the principle of time priority is adopted, for example, the overall execution efficiency is the highest, that is, f=0 and p=1 in the above-mentioned single node allocation objective function (formula (2)), then the Tasks of JobA, JobB, JobC and JobD Time priority will be fully considered when making allocations, that is, the resource manager will give priority to assigning resources to jobs that complete faster. At this point, the single node assigns the objective function degenerate into Furthermore, it is enough to allocate according to the principle that the value of the single-node objective function of each node is the largest, and the embodiment of the present invention will not illustrate this situation in detail.

由图9和图10对应的示例可以看出,若忽略公平性,则资源利用率优先的分配方式可有效缩短作业整体执行时间到4t,而若考虑公平性,则公平优先的分配方式的整体作业执行时间为6t。From the corresponding examples in Figure 9 and Figure 10, it can be seen that if fairness is ignored, the resource utilization priority allocation method can effectively shorten the overall execution time of the job to 4t, and if fairness is considered, the overall allocation method with fairness priority The job execution time is 6t.

如图11所示,本发明实施例提供一种资源管理器110,用于执行以上图5-图7所示的资源分配方法。该资源管理器110可以包括相应步骤所对应的单元,示例的,可以包括:接收单元1101、分解单元1106、估计单元1102、确定单元1103和分配单元1104。As shown in FIG. 11 , an embodiment of the present invention provides a resource manager 110 for executing the resource allocation methods shown in FIGS. 5-7 above. The resource manager 110 may include units corresponding to corresponding steps, for example, may include: a receiving unit 1101 , a decomposing unit 1106 , an estimating unit 1102 , a determining unit 1103 and an allocating unit 1104 .

接收单元1101,用于接收客户端设备提交的作业。The receiving unit 1101 is configured to receive a job submitted by a client device.

分解单元1106,用于将该作业分解为多个任务,其中,该多个任务中的每个任务均配置有相对应的资源需求量。The decomposing unit 1106 is configured to decompose the job into a plurality of tasks, wherein each task in the plurality of tasks is configured with a corresponding resource requirement.

估计单元1102,用于估计每个任务的运行时间。An estimating unit 1102, configured to estimate the running time of each task.

确定单元1103,用于根据每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形,该第一分配位形用于指示该多个任务在该多个计算节点中的可运行计算节点上的分布情况,该调度策略包括资源利用率优先策略和效率优先策略中的至少一种。The determination unit 1103 is configured to determine a first allocation configuration of the multiple tasks according to the resource requirement and running time corresponding to each task, in combination with a preset scheduling policy, and the first allocation configuration is used to indicate the multiple tasks The distribution of tasks on the runnable computing nodes among the plurality of computing nodes, the scheduling strategy includes at least one of resource utilization priority strategy and efficiency priority strategy.

分配单元1104,用于将该多个任务按照第一分配位形分配到该多个任务的可运行计算节点上。An allocating unit 1104, configured to allocate the multiple tasks to computing nodes that can run the multiple tasks according to a first allocation configuration.

可选的,若调度策略为资源利用率优先策略,则第一分配位形具体可以为使得该多个任务的可运行计算节点中的每个计算节点的单节点资源利用率最大的分配位形。Optionally, if the scheduling strategy is a resource utilization priority strategy, the first allocation configuration may specifically be an allocation configuration that maximizes the single-node resource utilization of each computing node among the executable computing nodes of the multiple tasks .

可选的,若调度策略为效率优先策略,则第一分配位形可以为使得作业的整体执行速度最快的分配位形。Optionally, if the scheduling policy is an efficiency-first policy, the first allocation configuration may be the allocation configuration that makes the overall job execution speed the fastest.

本发明实施例对第一分配位形的具体形式不作具体限定。The embodiment of the present invention does not specifically limit the specific form of the first allocation configuration.

可选的,估计单元1102具体可以用于:Optionally, the estimating unit 1102 may specifically be used for:

针对每个任务,均按照下面针对第一任务的操作进行处理:For each task, proceed as follows for the first task:

将第一任务的硬信息与样本库中的历史任务的硬信息进行匹配;Matching the hard information of the first task with the hard information of the historical tasks in the sample library;

若匹配成功,根据与第一任务的硬信息匹配的历史任务的历史运行时间估计第一任务的运行时间。If the matching is successful, the running time of the first task is estimated according to the historical running time of the historical task matching the hard information of the first task.

具体的,本发明实施例中的硬信息包括作业类型、执行用户等信息。Specifically, the hard information in this embodiment of the present invention includes job type, execution user and other information.

需要说明的是,本发明实施例仅是示例性的给出估计单元1102估计任务运行时间的具体实现,当然,估计单元1102还可以通过其它方式估计任务的运行时间,比如,通过预运行任务的方式。即,通过预先运行一小段作业实例获得准确的完整运行时间的估计。另外,同一作业的后续任务的运行时间参考已运行任务的运行时间也会获得更为准确的估计。本发明实施例对估计单元1102估计任务运行时间的具体实现方式不做限定。It should be noted that this embodiment of the present invention is only an example of a specific implementation of estimating the running time of a task by the estimating unit 1102. Of course, the estimating unit 1102 can also estimate the running time of a task in other ways, for example, by pre-running the task Way. That is, by pre-running a small number of job instances to obtain an accurate estimate of the full runtime. Also, the run times of subsequent tasks of the same job will be more accurately estimated by reference to the run times of tasks that have already run. The embodiment of the present invention does not limit the specific implementation manner of estimating the running time of the task by the estimating unit 1102 .

基于本发明实施例提供的资源管理器110,该资源管理器110在接收客户端设备提交的作业,并将该作业分解为多个有相应的资源需求量配置的任务之后,还估计每个任务的运行时间,并根据每个任务的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形,然后将该多个任务按照该第一分配位形分配到该多个任务的可运行计算节点上。其中,该第一分配位形用于指示该多个任务在该多个任务的可运行计算节点上的分布情况,调度策略包括资源利用率优先策略和效率优先策略中的至少一种。也就是说,该资源管理器110在进行资源分配时考虑了每个任务的运行时间的因素,并且将空间需求(即任务的资源需求量)和时间需求(即任务的时间)固定的Task调度到对应的节点上时,能够根据相应的调度策略灵活选择资源利用率优先策略和效率优先策略来进行资源分配,使得最终采用资源利用率较高和/或效率较高的分配位形。一方面,由于可以采用资源利用率较高的分配位形,即可以通过调度策略使得那些节点资源利用率较高的Task组合被调度到节点上,因此该资源管理器110能有效减轻现有技术中资源碎片的问题,从而提升集群的资源利用率。另一方面,由于可以采用效率较高的分配位形,即可以通过调度策略使得那些作业执行时间最短的Task组合被调度到节点上,因此与现有技术相比,该资源管理器110能显著缩短作业执行时间,提升作业执行效率。综上,本发明实施例提供的资源管理器110能够根据相应的调度策略灵活选择资源利用率优先策略和效率优先策略来进行资源分配,从而可以提高资源利用率,和/或,可以提升用户作业的执行效率。Based on the resource manager 110 provided by the embodiment of the present invention, after the resource manager 110 receives the job submitted by the client device and decomposes the job into multiple tasks with corresponding resource requirement configurations, it also estimates the The running time of each task, and according to the resource requirements and running time of each task, combined with the preset scheduling strategy, determine the first allocation configuration of the multiple tasks, and then allocate the multiple tasks according to the first allocation configuration to the runnable compute nodes for the multiple tasks. Wherein, the first allocation configuration is used to indicate the distribution of the multiple tasks on the computing nodes that can run the multiple tasks, and the scheduling strategy includes at least one of resource utilization priority strategy and efficiency priority strategy. That is to say, the resource manager 110 considers the running time factor of each task when performing resource allocation, and schedules tasks with fixed space requirements (i.e. task resource requirements) and time requirements (i.e. task time) When it is on the corresponding node, it can flexibly select the resource utilization rate priority strategy and the efficiency priority strategy according to the corresponding scheduling strategy for resource allocation, so that the allocation configuration with higher resource utilization rate and/or higher efficiency is finally adopted. On the one hand, since the allocation configuration with high resource utilization can be adopted, that is, the task combination with high resource utilization of those nodes can be scheduled to the nodes through the scheduling policy, so the resource manager 110 can effectively alleviate the existing technology. The problem of resource fragmentation in the middle, thereby improving the resource utilization of the cluster. On the other hand, since a more efficient allocation configuration can be adopted, that is, the task combination with the shortest job execution time can be scheduled to the node through the scheduling policy, so compared with the prior art, the resource manager 110 can significantly Shorten job execution time and improve job execution efficiency. To sum up, the resource manager 110 provided by the embodiment of the present invention can flexibly select a resource utilization priority strategy and an efficiency priority strategy for resource allocation according to the corresponding scheduling strategy, thereby improving resource utilization and/or improving user job performance. execution efficiency.

可选的,本发明实施例提供的资源管理器110在进行资源分配时,还可以同时考虑对于异构集群和特殊资源需求作业的资源分配。Optionally, when the resource manager 110 provided in the embodiment of the present invention allocates resources, it may also consider resource allocation for heterogeneous clusters and jobs with special resource requirements.

具体的,如图12所示,资源管理器110还可以包括分类单元1105。Specifically, as shown in FIG. 12 , the resource manager 110 may further include a classification unit 1105 .

在确定单元1103根据每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形之前,分类单元1105,用于将该多个任务按照资源的种类进行分类,获得至少一类任务。Before the determination unit 1103 determines the first allocation configuration of the multiple tasks according to the resource demand and running time corresponding to each task, combined with the preset scheduling policy, the classification unit 1105 is used to classify the multiple tasks according to the resource Classify the types of tasks to obtain at least one type of task.

确定单元1103具体用于:The determining unit 1103 is specifically used for:

针对该至少一类任务中的每类任务,均按照下面针对第一类任务的操作进行处理:For each type of task in the at least one type of task, perform the following operations for the first type of task:

根据第一类任务中每个任务对应的资源需求量和运行时间,结合调度策略,确定第一类任务的子分配位形,该子分配位形用于指示该第一类任务在多个计算节点中的可运行计算节点上的分布情况。According to the resource demand and running time corresponding to each task in the first type of task, combined with the scheduling strategy, determine the sub-allocation configuration of the first type of task, and the sub-allocation configuration is used to indicate that the first type of task is in multiple computing The distribution of runnable compute nodes in a node.

将该至少一类任务中的每类任务的子分配位形的组合确定为该多个任务的第一分配位形。A combination of sub-allocation configurations of each type of task in the at least one type of task is determined as a first allocation configuration of the plurality of tasks.

由于本发明实施例提供的资源管理器110可以首先将多个任务按照资源的种类进行分类,进而对于每一类任务分别进行资源分配,也就是说可以同时考虑对于异构集群和特殊资源需求作业的资源分配,因而具有更广泛的普适性和更好的综合表现。Since the resource manager 110 provided by the embodiment of the present invention can first classify multiple tasks according to the types of resources, and then perform resource allocation for each type of task separately, that is to say, it can simultaneously consider the tasks for heterogeneous clusters and special resource requirements. Therefore, it has wider applicability and better comprehensive performance.

可选的,考虑到运行时间的估计会与实际情况通常会产生一些偏差。如果对这些偏差不做控制,则随着时间的延长,作业资源的预分配结果与理想结果可能相差愈来愈大。因此,本发明实施例提供的资源管理器110中,在分配单元1104将多个任务按照第一分配位形分配到多个任务的可运行计算节点上之后,确定单元1103,还用于:Optionally, take into account that the running time estimates will usually have some deviation from the actual situation. If these deviations are not controlled, as time goes on, the pre-allocation results of job resources may be more and more different from the ideal results. Therefore, in the resource manager 110 provided by the embodiment of the present invention, after the allocation unit 1104 allocates multiple tasks to the runnable computing nodes of multiple tasks according to the first allocation configuration, the determination unit 1103 is further configured to:

根据第一分配位形,确定所有处于等待状态的任务运行在所分配的节点上时的第一整体分配目标函数值。According to the first allocation configuration, a first overall allocation objective function value when all tasks in the waiting state run on the allocated nodes is determined.

根据所有处于等待状态的任务对应的资源需求量和运行时间,结合调度策略,确定所有处于等待状态的任务的第二分配位形,该第二分配位形用于指示所有处于等待状态的任务在所有处于等待状态的任务的可运行计算节点上的分布情况。According to the resource requirements and running time corresponding to all tasks in the waiting state, combined with the scheduling strategy, determine the second allocation configuration of all tasks in the waiting state, and the second allocation configuration is used to indicate that all tasks in the waiting state are in The distribution of all waiting tasks across the runnable compute nodes.

根据该第二分配位形,确定所有处于等待状态的任务运行在所分配的节点上时的第二整体分配目标函数值;According to the second allocation configuration, determine a second overall allocation objective function value when all tasks in the waiting state run on the allocated nodes;

分配单元1104,还用于若第二整体分配目标函数值大于第一整体分配目标函数值,将所有处于等待状态的任务按照第二分配位形分配到所有处于等待状态的任务的可运行计算节点上。The allocation unit 1104 is further configured to allocate all the tasks in the waiting state to the runnable computing nodes of all the tasks in the waiting state according to the second allocation configuration if the value of the second overall allocation objective function is greater than the value of the first overall allocation objective function superior.

通过上述变异机制,可以使得作业资源的预分配结果向更好的方向进化。其中,图8示意性的给出了一种作业资源的分配结果的变异机制示意图,可以看出,随着时间的延长,作业资源的预分配结果与理想结果可能相差愈来愈大。经过上述变异机制,可以将节点3上的任务1调整到节点1上,将节点2上的任务2调整到节点3上,将节点1上的任务3调整到节点2上,从而使得作业资源的预分配结果向更好的方向进化。Through the above mutation mechanism, the pre-allocation results of job resources can be made to evolve in a better direction. Among them, Fig. 8 schematically shows a schematic diagram of a variation mechanism of job resource allocation results. It can be seen that as time goes on, the difference between the pre-allocation results of job resources and the ideal results may become larger and larger. Through the above mutation mechanism, task 1 on node 3 can be adjusted to node 1, task 2 on node 2 can be adjusted to node 3, and task 3 on node 1 can be adjusted to node 2, so that the job resource The pre-allocation results are evolving in a better direction.

可选的,本发明实施例中,整体分配目标函数值可通过上述公式(1)获得,本发明实施例在此不再赘述。Optionally, in the embodiment of the present invention, the value of the overall allocation objective function can be obtained by the above formula (1), and the embodiment of the present invention will not repeat it here.

即,上述的第一整体分配目标函数值等于第一分配位形时,所有处于等待状态的任务运行在所分配的节点上时各个节点的单节点分配目标函数值的和。That is, when the above-mentioned first overall allocation objective function value is equal to the sum of the single-node allocation objective function values of each node when all tasks in the waiting state run on the allocated nodes in the first allocation configuration.

上述的第二整体分配目标函数值等于第二分配位形时,所有处于等待状态的任务运行在所分配的节点上时各个节点的单节点分配目标函数值的和。The value of the above-mentioned second overall allocation objective function is equal to the sum of the single-node allocation objective function values of each node when all tasks in the waiting state run on the allocated nodes in the second allocation configuration.

可选的,本发明实施例中,单节点分配目标函数具体可以包括:Optionally, in this embodiment of the present invention, the single-node allocation objective function may specifically include:

其中,Sn表示单节点n的分配目标函数值;p表示作业时间优先级因子,p>0;f表示作业公平性因子,f>0;p+f≤1;m表示作业的数量;Se,n表示节点n上的资源利用率得分,rn表示节点n的资源,rt表示任务t的资源需求量;Sp,j表示作业j的执行进度,Tj表示作业j还需要多少时间执行完毕,该值可由历史统计数据得出,T0表示作业j的总体运行时间;Sf,j表示作业j的公平性得分,rj表示作业i的资源需求,rf表示作业i在完全公平情况下的应得资源。Among them, S n represents the distribution objective function value of single node n; p represents the job time priority factor, p>0; f represents the job fairness factor, f>0;p+f≤1; m represents the number of jobs; S e, n represents the resource utilization score on node n, r n represents the resources of node n, r t represents the resource demand of task t; S p,j represents the execution progress of job j, T j indicates how much time is required for job j to be executed, and this value can be obtained from historical statistical data. T 0 indicates the overall running time of job j; S f, j indicates the fairness score of job j, r j represents the resource requirement of job i, and r f represents the due resource of job i in the case of complete fairness.

其中,在上述单节点分配目标函数中,考虑了节点的资源利用率、作业的公平性以及作业的执行进度。其中,当f=0,p=0时,完全考虑资源利用率,即资源管理器110按照整体资源利用率最高的原则分配资源;当f=1,p=0时,完全考虑公平,即资源管理器110会在不同的作业间公平地分配资源;当f=1,p=1时,则完全考虑时间优先级,即资源管理器110优先分配资源给那些更快完成的作业。当然,f和p还可以为其它数值,用户可以根据作业的运行需求进行设置,使得分配在最优资源利用率和最优作业执行时间取得平衡,本发明实施例对此不作具体限定。Wherein, in the above-mentioned single-node allocation objective function, the resource utilization rate of the node, the fairness of the job, and the execution progress of the job are considered. Wherein, when f=0, p=0, resource utilization is fully considered, that is, the resource manager 110 allocates resources according to the principle of the highest overall resource utilization; when f=1, p=0, fairness is fully considered, that is, resource The manager 110 will allocate resources among different jobs fairly; when f=1 and p=1, time priority is fully considered, that is, the resource manager 110 gives priority to assigning resources to jobs that finish faster. Of course, f and p can also be other values, and the user can set them according to the operation requirements of the job, so that the allocation can be balanced between the optimal resource utilization rate and the optimal job execution time, which is not specifically limited in the embodiment of the present invention.

需要说明的是,本发明实施例中的接收单元1101可以为资源管理器110上具备接收功能的接口电路,如接收机或接收器;也可以为资源管理器110上具备接收功能的网卡或输入/输出(英文全称:input/output,英文缩写:I/O)接口,本发明实施例对此不作具体限定。It should be noted that the receiving unit 1101 in the embodiment of the present invention can be an interface circuit with a receiving function on the resource manager 110, such as a receiver or a receiver; it can also be a network card or an input circuit with a receiving function on the resource manager 110. /output (English full name: input/output, English abbreviation: I/O) interface, which is not specifically limited in this embodiment of the present invention.

估计单元1102、确定单元1103、分配单元1104和分类单元1105可以为单独设立的处理器,也可以集成在资源管理器110的某一个处理器中实现,此外,也可以以程序代码的形式存储于资源管理器110存储器中,由资源管理器110的某一个处理器调用并执行以上估计单元1102、确定单元1103、分配单元1104和分类单元1105的功能。这里所述的处理器可以是一个中央处理器(英文全称:central processing unit,英文缩写:CPU),还可以为其他通用处理器、数字信号处理器(英文全称:digital signal processing,英文缩写:DSP)、专用集成电路(英文全称:application specific integrated circuit,英文缩写:ASIC)、现场可编程门阵列(英文全称:field-programmable gate array,英文缩写:FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。另外,该处理器还可以为专用处理器,该专用处理器可以包括基带处理芯片、射频处理芯片等中的至少一个。进一步地,该专用处理器还可以包括具有资源管理器110其他专用处理功能的芯片。Estimating unit 1102, determining unit 1103, allocating unit 1104 and classifying unit 1105 may be independently established processors, or may be integrated in a certain processor of resource manager 110, and may also be stored in the form of program codes in In the memory of the resource manager 110, a certain processor of the resource manager 110 invokes and executes the functions of the estimation unit 1102, the determination unit 1103, the allocation unit 1104 and the classification unit 1105 above. The processor described here can be a central processing unit (English full name: central processing unit, English abbreviation: CPU), and can also be other general-purpose processors, digital signal processors (English full name: digital signal processing, English abbreviation: DSP) ), application specific integrated circuit (English full name: application specific integrated circuit, English abbreviation: ASIC), field programmable gate array (English full name: field-programmable gate array, English abbreviation: FPGA) or other programmable logic devices, discrete gates or Transistor logic devices, discrete hardware components, and more. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. In addition, the processor may also be a dedicated processor, and the dedicated processor may include at least one of a baseband processing chip, a radio frequency processing chip, and the like. Further, the dedicated processor may also include a chip with other dedicated processing functions of the resource manager 110 .

可以理解,本发明实施例中的资源管理器110可对应于上述图5-图7所示的资源分配方法中的资源管理器,并且本发明实施例中的资源管理器110中的各个单元的划分和/或功能等均是为了实现上述图5-图7所示的资源分配方法流程,为了简洁,在此不再赘述。It can be understood that the resource manager 110 in the embodiment of the present invention may correspond to the resource manager in the resource allocation method shown in FIGS. 5-7 above, and each unit in the resource manager 110 in the embodiment of the present invention The division and/or functions are all for realizing the flow of the resource allocation method shown in FIGS. 5-7 above, and for the sake of brevity, details are not repeated here.

如图13所示,本发明实施例提供一种资源管理器130,包括:处理器1301、存储器1302、总线1303和通信接口1304。As shown in FIG. 13 , an embodiment of the present invention provides a resource manager 130 , including: a processor 1301 , a memory 1302 , a bus 1303 and a communication interface 1304 .

存储器1302用于存储计算机执行指令,处理器1301与存储器1302通过总线连接,当资源管理器130运行时,处理器1301执行存储器1303存储的计算机执行指令,以使资源管理器130执行如图5-图7所示的资源分配方法。具体的地址分配方法可参见上述如图5-图7所示的实施例中的相关描述,此处不再赘述。The memory 1302 is used to store computer-executable instructions, and the processor 1301 is connected to the memory 1302 through a bus. When the resource manager 130 is running, the processor 1301 executes the computer-executable instructions stored in the memory 1303, so that the resource manager 130 executes as shown in Figure 5- The resource allocation method shown in FIG. 7 . For a specific address allocation method, reference may be made to relevant descriptions in the above-mentioned embodiments shown in FIGS. 5-7 , and details are not repeated here.

其中,本发明实施例中的处理器1301可以是一个中央处理器(英文全称:centralprocessing unit,英文缩写:CPU),还可以为其他通用处理器、数字信号处理器(英文全称:digital signal processing,英文缩写:DSP)、专用集成电路(英文全称:applicationspecific integrated circuit,英文缩写:ASIC)、现场可编程门阵列(英文全称:field-programmable gate array,英文缩写:FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。Wherein, the processor 1301 in the embodiment of the present invention may be a central processing unit (English full name: central processing unit, English abbreviation: CPU), and may also be other general-purpose processors, digital signal processors (English full name: digital signal processing, English abbreviation: DSP), application specific integrated circuit (full English name: applicationspecific integrated circuit, English abbreviation: ASIC), field programmable gate array (English full name: field-programmable gate array, English abbreviation: FPGA) or other programmable logic devices, Discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

另外,该处理器1301还可以为专用处理器,该专用处理器可以包括基带处理芯片、射频处理芯片等中的至少一个。进一步地,该专用处理器还可以包括具有资源管理器130其他专用处理功能的芯片。In addition, the processor 1301 may also be a dedicated processor, and the dedicated processor may include at least one of a baseband processing chip, a radio frequency processing chip, and the like. Further, the dedicated processor may also include a chip with other dedicated processing functions of the resource manager 130 .

存储器1302可以包括易失性存储器(英文:volatile memory),例如随机存取存储器(英文全称:random-access memory,英文缩写:RAM);存储器1302也可以包括非易失性存储器(英文:non-volatile memory),例如只读存储器(英文全称:read-only memory,英文缩写:ROM),快闪存储器(英文:flash memory),硬盘(英文全称:hard disk drive,英文缩写:HDD)或固态硬盘(英文全称:solid-state drive,英文缩写:SSD);另外,存储器1302还可以包括上述种类的存储器的组合。The memory 1302 may include a volatile memory (English: volatile memory), such as a random access memory (English full name: random-access memory, English abbreviation: RAM); the memory 1302 may also include a non-volatile memory (English: non- volatile memory), such as read-only memory (English full name: read-only memory, English abbreviation: ROM), flash memory (English: flash memory), hard disk (English full name: hard disk drive, English abbreviation: HDD) or solid state drive (English full name: solid-state drive, English abbreviation: SSD); in addition, the storage 1302 may also include a combination of the above-mentioned types of storage.

总线1303可以包括数据总线、电源总线、控制总线和信号状态总线等。本实施例中为了清楚说明,在图13中将各种总线都示意为总线1303。The bus 1303 may include a data bus, a power bus, a control bus, a signal status bus, and the like. In this embodiment, for clarity, various buses are shown as bus 1303 in FIG. 13 .

在具体实现过程中,上述如图5-图7所示的资源分配方法流程中的各步骤均可以通过硬件形式的处理器1301执行存储器1302中存储的软件形式的计算机执行指令实现。为避免重复,此处不再赘述。In a specific implementation process, each step in the flow of the above-mentioned resource allocation method as shown in FIGS. To avoid repetition, details are not repeated here.

基于本发明实施例提供的资源管理器,该资源管理器在接收客户端设备提交的作业,并将该作业分解为多个有相应的资源需求量配置的任务之后,还估计每个任务的运行时间,并根据每个任务的资源需求量和运行时间,结合预设的调度策略,确定该多个任务的第一分配位形,然后将该多个任务按照该第一分配位形分配到该多个任务的可运行计算节点上。其中,该第一分配位形用于指示该多个任务在该多个任务的可运行计算节点上的分布情况,调度策略包括资源利用率优先策略和效率优先策略中的至少一种。也就是说,该资源管理器在进行资源分配时考虑了每个任务的运行时间的因素,并且将空间需求(即任务的资源需求量)和时间需求(即任务的时间)固定的Task调度到对应的节点上时,能够根据相应的调度策略灵活选择资源利用率优先策略和效率优先策略来进行资源分配,使得最终采用资源利用率较高和/或效率较高的分配位形。一方面,由于可以采用资源利用率较高的分配位形,即可以通过调度策略使得那些节点资源利用率较高的Task组合被调度到节点上,因此该资源管理器能有效减轻现有技术中资源碎片的问题,从而提升集群的资源利用率。另一方面,由于可以采用效率较高的分配位形,即可以通过调度策略使得那些作业执行时间最短的Task组合被调度到节点上,因此与现有技术相比,该资源管理器能显著缩短作业执行时间,提升作业执行效率。综上,本发明实施例提供的资源管理器能够根据相应的调度策略灵活选择资源利用率优先策略和效率优先策略来进行资源分配,从而可以提高资源利用率,和/或,可以提升用户作业的执行效率。Based on the resource manager provided by the embodiment of the present invention, after the resource manager receives the job submitted by the client device and decomposes the job into a plurality of tasks with corresponding resource requirement configurations, it also estimates the operation of each task. Time, and according to the resource demand and running time of each task, combined with the preset scheduling strategy, determine the first allocation configuration of the multiple tasks, and then allocate the multiple tasks to the Compute nodes that can run multiple tasks. Wherein, the first allocation configuration is used to indicate the distribution of the multiple tasks on the computing nodes that can run the multiple tasks, and the scheduling strategy includes at least one of resource utilization priority strategy and efficiency priority strategy. That is to say, the resource manager considers the running time factor of each task when performing resource allocation, and schedules Tasks with fixed space requirements (that is, the resource requirements of tasks) and time requirements (that is, the time of tasks) to When on the corresponding node, the resource utilization priority strategy and the efficiency priority strategy can be flexibly selected according to the corresponding scheduling strategy for resource allocation, so that the allocation configuration with higher resource utilization rate and/or higher efficiency is finally adopted. On the one hand, since the allocation configuration with high resource utilization can be adopted, that is, the task combination with high resource utilization of those nodes can be scheduled to the nodes through the scheduling strategy, so the resource manager can effectively alleviate the problems in the prior art. The problem of resource fragmentation, thereby improving the resource utilization of the cluster. On the other hand, since a more efficient allocation configuration can be adopted, that is, the task combination with the shortest execution time of the job can be scheduled to the node through the scheduling strategy, so compared with the existing technology, the resource manager can significantly shorten Job execution time, improving job execution efficiency. To sum up, the resource manager provided by the embodiment of the present invention can flexibly select the resource utilization rate priority policy and the efficiency priority policy to allocate resources according to the corresponding scheduling policy, so as to improve the resource utilization rate, and/or improve the efficiency of user jobs. effectiveness.

可选的,本发明实施例还提供一种可读介质,该可读介质用于存储计算机执行指令,当资源管理器的处理器执行该计算机执行指令时,该资源管理器执行如图5-图7所示的资源分配方法。具体的资源分配方法可参见上述如图5-图7所示的实施例中的相关描述,此处不再赘述。Optionally, an embodiment of the present invention also provides a readable medium, which is used to store computer-executable instructions. When the processor of the resource manager executes the computer-executable instructions, the resource manager executes the computer-executable instructions shown in Figure 5- The resource allocation method shown in FIG. 7 . For a specific resource allocation method, reference may be made to relevant descriptions in the embodiments shown in FIGS. 5-7 above, and details are not repeated here.

所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the above-described device is only illustrated by dividing the above-mentioned functional modules. In practical applications, the above-mentioned functions can be allocated by different Completion of functional modules means that the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. For the specific working processes of the above-described systems, devices, and units, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not repeated here.

在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) execute all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage media include: U disk, mobile hard disk, ROM, RAM), magnetic disk or optical disk and other media that can store program codes.

以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. Should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims (14)

1.一种分布式计算系统中的资源分配方法,所述分布式计算系统包括多个计算节点,其特征在于,所述方法包括:1. A resource allocation method in a distributed computing system, said distributed computing system comprising a plurality of computing nodes, characterized in that said method comprises: 接收客户端设备提交的作业,并将所述作业分解为多个任务,其中,所述多个任务中的每个任务均配置有相对应的资源需求量;receiving a job submitted by a client device, and decomposing the job into a plurality of tasks, wherein each task in the plurality of tasks is configured with a corresponding resource requirement; 估计所述每个任务的运行时间;Estimate the runtime of each of the tasks; 根据所述每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定所述多个任务的第一分配位形,所述第一分配位形用于指示所述多个任务在所述多个计算节点中的可运行计算节点上的分布情况,所述调度策略包括资源利用率优先策略和效率优先策略中的至少一种;According to the resource demand and running time corresponding to each task, combined with a preset scheduling policy, determine a first allocation configuration of the multiple tasks, and the first allocation configuration is used to indicate the multiple tasks Distribution situation on the executable computing nodes among the plurality of computing nodes, the scheduling strategy includes at least one of resource utilization priority strategy and efficiency priority strategy; 将所述多个任务按照所述第一分配位形分配到所述多个任务的可运行计算节点上。Allocating the multiple tasks to computing nodes that can run the multiple tasks according to the first allocation configuration. 2.根据权利要求1所述的方法,其特征在于,若所述调度策略为资源利用率优先策略,则所述第一分配位形为使得所述多个任务的可运行计算节点中的每个计算节点的单节点资源利用率最大的分配位形。2. The method according to claim 1, wherein if the scheduling policy is a resource utilization priority policy, the first allocation configuration is such that each of the executable computing nodes of the multiple tasks The allocation configuration with the maximum single-node resource utilization of computing nodes. 3.根据权利要求1所述的方法,其特征在于,若所述调度策略为效率优先策略,则所述第一分配位形为使得所述作业的整体执行速度最快的分配位形。3 . The method according to claim 1 , wherein if the scheduling strategy is an efficiency-first strategy, the first allocation configuration is an allocation configuration that enables the overall execution speed of the job to be the fastest. 4 . 4.根据权利要求1-3任一项所述的方法,其特征在于,所述估计所述每个任务的运行时间,包括:4. The method according to any one of claims 1-3, wherein the estimating the running time of each task comprises: 针对所述每个任务,均按照下面针对第一任务的操作进行处理:For each task described above, it is processed according to the following operations for the first task: 将所述第一任务的硬信息与样本库中的历史任务的硬信息进行匹配;Matching the hard information of the first task with the hard information of the historical tasks in the sample library; 若匹配成功,根据与所述第一任务的硬信息匹配的历史任务的历史运行时间估计所述第一任务的运行时间。If the matching is successful, the running time of the first task is estimated according to the historical running time of the historical task matched with the hard information of the first task. 5.根据权利要求1-4任一项所述的方法,其特征在于,在所述将所述多个任务按照所述第一分配位形分配到所述多个任务的可运行计算节点上之后,还包括:5. The method according to any one of claims 1-4, wherein, when the multiple tasks are allocated to the runable computing nodes of the multiple tasks according to the first allocation configuration After that, also include: 根据所述第一分配位形,确定所有处于等待状态的任务运行在所分配的节点上时的第一整体分配目标函数值;According to the first allocation configuration, determine the first overall allocation objective function value when all tasks in the waiting state are running on the allocated nodes; 根据所述所有处于等待状态的任务对应的资源需求量和运行时间,结合所述调度策略,确定所述所有处于等待状态的任务的第二分配位形,所述第二分配位形用于指示所述所有处于等待状态的任务在所述所有处于等待状态的任务的可运行计算节点上的分布情况;According to the resource requirements and running time corresponding to all the tasks in the waiting state, combined with the scheduling policy, determine the second allocation configuration of all the tasks in the waiting state, and the second allocation configuration is used to indicate The distribution of all tasks in the waiting state on the runnable computing nodes of the tasks in the waiting state; 根据所述第二分配位形,确定所述所有处于等待状态的任务运行在所分配的节点上时的第二整体分配目标函数值;According to the second allocation configuration, determine a second overall allocation objective function value when all the tasks in the waiting state are running on the allocated nodes; 若所述第二整体分配目标函数值大于所述第一整体分配目标函数值,将所述所有处于等待状态的任务按照所述第二分配位形分配到所述所有处于等待状态的任务的可运行计算节点上。If the value of the second overall allocation objective function is greater than the value of the first overall allocation objective function, allocating the tasks in the waiting state to all available tasks in the waiting state according to the second allocation configuration run on a compute node. 6.根据权利要求5所述的方法,其特征在于,所述第一整体分配目标函数值等于所述第一分配位形时,所有处于等待状态的任务运行在所分配的节点上时各个节点的单节点分配目标函数值的和;6. The method according to claim 5, wherein when the first overall allocation objective function value is equal to the first allocation configuration, when all tasks in the waiting state run on the allocated nodes, each node The sum of the single-node assignment objective function values of ; 所述第二整体分配目标函数值等于所述第二分配位形时,所有处于等待状态的任务运行在所分配的节点上时各个节点的单节点分配目标函数值的和。The second overall allocation objective function value is equal to the sum of the single-node allocation objective function values of each node when all tasks in the waiting state run on the allocated nodes when the second allocation configuration is configured. 7.一种资源管理器,其特征在于,所述资源管理器包括:接收单元、分解单元、估计单元、确定单元和分配单元;7. A resource manager, characterized in that the resource manager comprises: a receiving unit, a decomposing unit, an estimating unit, a determining unit and an allocating unit; 所述接收单元,用于接收客户端设备提交的作业;The receiving unit is configured to receive a job submitted by a client device; 所述分解单元,用于将所述作业分解为多个任务,其中,所述多个任务中的每个任务均配置有相对应的资源需求量;The decomposing unit is configured to decompose the job into a plurality of tasks, wherein each task in the plurality of tasks is configured with a corresponding resource requirement; 所述估计单元,用于估计所述每个任务的运行时间;The estimating unit is configured to estimate the running time of each task; 所述确定单元,用于根据所述每个任务对应的资源需求量和运行时间,结合预设的调度策略,确定所述多个任务的第一分配位形,所述第一分配位形用于指示所述多个任务在所述多个计算节点中的可运行计算节点上的分布情况,所述调度策略包括资源利用率优先策略和效率优先策略中的至少一种;The determining unit is configured to determine a first allocation configuration of the plurality of tasks according to the resource demand and running time corresponding to each task, in combination with a preset scheduling strategy, and the first allocation configuration is used for In order to indicate the distribution of the multiple tasks on the runnable computing nodes among the multiple computing nodes, the scheduling policy includes at least one of a resource utilization priority policy and an efficiency priority policy; 所述分配单元,用于将所述多个任务按照所述第一分配位形分配到所述多个任务的可运行计算节点上。The allocating unit is configured to allocate the multiple tasks to computing nodes that can run the multiple tasks according to the first allocation configuration. 8.根据权利要求7所述的资源管理器,其特征在于,若所述调度策略为资源利用率优先策略,则所述第一分配位形为使得所述多个任务的可运行计算节点中的每个计算节点的单节点资源利用率最大的分配位形。8. The resource manager according to claim 7, wherein if the scheduling strategy is a resource utilization priority strategy, the first allocation configuration is such that among the executable computing nodes of the plurality of tasks The allocation configuration that maximizes the single-node resource utilization of each computing node. 9.根据权利要求8所述的资源管理器,其特征在于,若所述调度策略为效率优先策略,则所述第一分配位形为使得所述作业的整体执行速度最快的分配位形。9. The resource manager according to claim 8, wherein if the scheduling policy is an efficiency priority policy, the first allocation configuration is the allocation configuration that makes the overall execution speed of the job the fastest . 10.根据权利要求7-9任一项所述的资源管理器,其特征在于,所述估计单元具体用于:10. The resource manager according to any one of claims 7-9, wherein the estimation unit is specifically used for: 针对所述每个任务,均按照下面针对第一任务的操作进行处理:For each task described above, it is processed according to the following operations for the first task: 将所述第一任务的硬信息与样本库中的历史任务的硬信息进行匹配;Matching the hard information of the first task with the hard information of the historical tasks in the sample library; 若匹配成功,根据与所述第一任务的硬信息匹配的历史任务的历史运行时间估计所述第一任务的运行时间。If the matching is successful, the running time of the first task is estimated according to the historical running time of the historical task matched with the hard information of the first task. 11.根据权利要求7-10任一项所述的资源管理器,其特征在于,在所述分配单元将所述多个任务按照所述第一分配位形分配到所述多个任务的可运行计算节点上之后,所述确定单元,还用于:11. The resource manager according to any one of claims 7-10, wherein the allocation unit allocates the multiple tasks to the available tasks of the multiple tasks according to the first allocation configuration After running on the compute node, the determination unit is also used to: 根据所述第一分配位形,确定所有处于等待状态的任务运行在所分配的节点上时的第一整体分配目标函数值;According to the first allocation configuration, determine the first overall allocation objective function value when all tasks in the waiting state are running on the allocated nodes; 根据所述所有处于等待状态的任务对应的资源需求量和运行时间,结合所述调度策略,确定所述所有处于等待状态的任务的第二分配位形,所述第二分配位形用于指示所述所有处于等待状态的任务在所述所有处于等待状态的任务的可运行计算节点上的分布情况;According to the resource requirements and running time corresponding to all the tasks in the waiting state, combined with the scheduling policy, determine the second allocation configuration of all the tasks in the waiting state, and the second allocation configuration is used to indicate The distribution of all tasks in the waiting state on the runnable computing nodes of the tasks in the waiting state; 根据所述第二分配位形,确定所述所有处于等待状态的任务运行在所分配的节点上时的第二整体分配目标函数值;According to the second allocation configuration, determine a second overall allocation objective function value when all the tasks in the waiting state are running on the allocated nodes; 所述分配单元,还用于若所述第二整体分配目标函数值大于所述第一整体分配目标函数值,将所述所有处于等待状态的任务按照所述第二分配位形分配到所述所有处于等待状态的任务的可运行计算节点上。The allocating unit is further configured to allocate all the tasks in the waiting state to the All pending tasks on runnable compute nodes. 12.根据权利要求11所述的资源管理器,其特征在于,所述第一整体分配目标函数值等于所述第一分配位形时,所有处于等待状态的任务运行在所分配的节点上时各个节点的单节点分配目标函数值的和;12. The resource manager according to claim 11, wherein when the first overall allocation objective function value is equal to the first allocation configuration, all tasks in the waiting state are running on the allocated nodes The sum of the single node distribution objective function values of each node; 所述第二整体分配目标函数值等于所述第二分配位形时,所有处于等待状态的任务运行在所分配的节点上时各个节点的单节点分配目标函数值的和。The second overall allocation objective function value is equal to the sum of the single-node allocation objective function values of each node when all tasks in the waiting state run on the allocated nodes when the second allocation configuration is configured. 13.一种资源管理器,其特征在于,所述资源管理器包括:处理器、存储器、总线和通信接口;13. A resource manager, characterized in that the resource manager comprises: a processor, a memory, a bus, and a communication interface; 所述存储器用于存储计算机执行指令,所述处理器与所述存储器通过所述总线连接,当所述资源管理器运行时,所述处理器执行所述存储器存储的所述计算机执行指令,以使所述资源管理器执行如权利要求1-6任一项所述的分布式计算系统中的资源分配方法。The memory is used to store computer-executable instructions, the processor is connected to the memory through the bus, and when the resource manager is running, the processor executes the computer-executable instructions stored in the memory to The resource manager is made to execute the resource allocation method in the distributed computing system according to any one of claims 1-6. 14.一种分布式计算机系统,其特征在于,所述分布式计算机系统包括多个计算节点和权利要求7-12任一项所述的资源管理器;14. A distributed computer system, characterized in that the distributed computer system comprises a plurality of computing nodes and the resource manager according to any one of claims 7-12; 或者,所述分布式计算机系统包括多个计算节点和权利要求13所述的资源管理器。Alternatively, the distributed computer system includes multiple computing nodes and the resource manager according to claim 13 .
CN201610080980.XA 2016-02-05 2016-02-05 Resource allocation method and resource manager Active CN107045456B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610080980.XA CN107045456B (en) 2016-02-05 2016-02-05 Resource allocation method and resource manager
PCT/CN2016/112186 WO2017133351A1 (en) 2016-02-05 2016-12-26 Resource allocation method and resource manager

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610080980.XA CN107045456B (en) 2016-02-05 2016-02-05 Resource allocation method and resource manager

Publications (2)

Publication Number Publication Date
CN107045456A true CN107045456A (en) 2017-08-15
CN107045456B CN107045456B (en) 2020-03-10

Family

ID=59500533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610080980.XA Active CN107045456B (en) 2016-02-05 2016-02-05 Resource allocation method and resource manager

Country Status (2)

Country Link
CN (1) CN107045456B (en)
WO (1) WO2017133351A1 (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992359A (en) * 2017-11-27 2018-05-04 江苏海平面数据科技有限公司 The task scheduling algorithm that cost perceives under a kind of cloud environment
CN108021453A (en) * 2017-12-22 2018-05-11 联想(北京)有限公司 A kind of computing resource optimization method, device and server cluster
CN108196959A (en) * 2018-02-07 2018-06-22 聚好看科技股份有限公司 The method for managing resource and device of ETL system
CN108279980A (en) * 2018-01-22 2018-07-13 上海联影医疗科技有限公司 Resource allocation methods and system and resource allocation terminal
CN108536530A (en) * 2018-04-02 2018-09-14 北京中电普华信息技术有限公司 A kind of multithreading method for scheduling task and device
CN108897619A (en) * 2018-06-27 2018-11-27 国家超级计算天津中心 A kind of multi-layer resource flexibility configuration method for supercomputer
CN109947532A (en) * 2019-03-01 2019-06-28 中山大学 A big data task scheduling method in education cloud platform
CN110046034A (en) * 2018-01-15 2019-07-23 北京国双科技有限公司 Task acquisition methods and device
CN110399222A (en) * 2019-07-25 2019-11-01 北京邮电大学 GPU cluster deep learning task parallelization method, device and electronic equipment
CN110620818A (en) * 2019-09-18 2019-12-27 东软集团股份有限公司 Method, device and related equipment for realizing node distribution
CN110633946A (en) * 2018-06-22 2019-12-31 西门子股份公司 task assignment system
CN111045795A (en) * 2018-10-11 2020-04-21 浙江宇视科技有限公司 Resource scheduling method and device
CN111108480A (en) * 2017-09-19 2020-05-05 华为技术有限公司 A system and method for distributed resource requirements and allocation
CN111143057A (en) * 2019-12-13 2020-05-12 中国科学院深圳先进技术研究院 Heterogeneous cluster data processing method and system based on multiple data centers and electronic equipment
CN111258745A (en) * 2018-11-30 2020-06-09 华为终端有限公司 Task processing method and device
CN111353696A (en) * 2020-02-26 2020-06-30 中国工商银行股份有限公司 Resource pool scheduling method and device
CN111459641A (en) * 2020-04-08 2020-07-28 广州欢聊网络科技有限公司 Cross-machine-room task scheduling and task processing method and device
CN111539613A (en) * 2020-04-20 2020-08-14 浙江网商银行股份有限公司 Case distribution method and device
CN112099952A (en) * 2020-09-16 2020-12-18 亚信科技(中国)有限公司 Resource scheduling method, device, electronic device and storage medium
CN112272203A (en) * 2020-09-18 2021-01-26 苏州浪潮智能科技有限公司 Cluster service node selection method, system, terminal and storage medium
CN112286676A (en) * 2019-07-25 2021-01-29 罗伯特·博世有限公司 Apparatus and computer-implemented method for planning resources
CN112395077A (en) * 2019-08-16 2021-02-23 阿里巴巴集团控股有限公司 Resource control method, device and system
CN113032113A (en) * 2019-12-25 2021-06-25 中科寒武纪科技股份有限公司 Task scheduling method and related product
CN113448728A (en) * 2021-06-22 2021-09-28 腾讯科技(深圳)有限公司 Cloud resource scheduling method, device, equipment and storage medium
CN113986509A (en) * 2021-11-02 2022-01-28 中国建设银行股份有限公司 Resource object management method, device, electronic device and computer storage medium
WO2022068643A1 (en) * 2020-09-29 2022-04-07 华为技术有限公司 Multi-task deployment method and apparatus
CN114968570A (en) * 2022-05-20 2022-08-30 广东电网有限责任公司 Real-time computing system applied to digital power grid and working method thereof
WO2024125251A1 (en) * 2022-12-16 2024-06-20 华为技术有限公司 Resource allocation method and apparatus
CN118939427A (en) * 2024-08-06 2024-11-12 广州宏同信息技术有限公司 A resource allocation management method and computer equipment for cloud servers

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108845874B (en) * 2018-06-25 2023-03-21 腾讯科技(深圳)有限公司 Dynamic resource allocation method and server
CN111831424B (en) * 2019-04-17 2023-09-05 杭州海康威视数字技术股份有限公司 Task processing method, system and device
CN112148471B (en) * 2019-06-29 2023-07-07 华为技术服务有限公司 Method and device for scheduling resources in distributed computing system
CN112882824A (en) * 2019-11-29 2021-06-01 北京国双科技有限公司 Memory resource allocation method, device and equipment
CN111061565B (en) * 2019-12-12 2023-08-25 湖南大学 Two-section pipeline task scheduling method and system in Spark environment
CN111796940B (en) * 2020-07-06 2024-01-26 中国铁塔股份有限公司 Resource allocation method and device and electronic equipment
CN112000485B (en) * 2020-09-01 2024-01-12 北京元心科技有限公司 Task allocation method, device, electronic equipment and computer readable storage medium
CN112348369B (en) * 2020-11-11 2024-03-22 博康智能信息技术有限公司 Multi-objective multi-resource dynamic scheduling method for security of major activities
CN112612616B (en) * 2020-12-28 2024-02-23 中国农业银行股份有限公司 Task processing method and device
CN113127203B (en) * 2021-04-25 2022-06-14 华南理工大学 Deep learning distributed compiler for cloud edge computing and construction method
CN113238848B (en) * 2021-05-27 2024-12-10 上海商汤科技开发有限公司 A task scheduling method, device, computer equipment and storage medium
CN113835876B (en) * 2021-08-19 2024-09-17 浪潮软件集团有限公司 Artificial intelligent acceleration card scheduling method and device based on domestic CPU and OS
CN114237869B (en) * 2021-11-17 2022-09-16 中国人民解放军军事科学院国防科技创新研究院 Ray double-layer scheduling method and device based on reinforcement learning and electronic equipment
CN115033358B (en) * 2022-05-19 2024-10-15 苏州浪潮智能科技有限公司 A method, device, computer equipment and storage medium for job scheduling
CN115495251B (en) * 2022-11-17 2023-02-07 北京滴普科技有限公司 A method and system for intelligently controlling computing resources in data integration operations
CN118484276A (en) * 2023-02-10 2024-08-13 华为云计算技术有限公司 Job scheduling method and device
CN118377810B (en) * 2024-06-26 2024-10-15 济南浪潮数据技术有限公司 Data set merging method, device, medium, program product and retrieval system
CN118916183B (en) * 2024-10-11 2025-03-11 天云智慧科技(山东)有限公司 Task decomposition and distribution method and system for industrial Internet

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541628A (en) * 2010-12-17 2012-07-04 三星电子株式会社 Compiling apparatus and method for a multicore device
CN102929718A (en) * 2012-09-17 2013-02-13 江苏九章计算机科技有限公司 Distributed GPU (graphics processing unit) computer system based on task scheduling
US8943353B2 (en) * 2013-01-31 2015-01-27 Hewlett-Packard Development Company, L.P. Assigning nodes to jobs based on reliability factors

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541628A (en) * 2010-12-17 2012-07-04 三星电子株式会社 Compiling apparatus and method for a multicore device
CN102929718A (en) * 2012-09-17 2013-02-13 江苏九章计算机科技有限公司 Distributed GPU (graphics processing unit) computer system based on task scheduling
US8943353B2 (en) * 2013-01-31 2015-01-27 Hewlett-Packard Development Company, L.P. Assigning nodes to jobs based on reliability factors

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111108480A (en) * 2017-09-19 2020-05-05 华为技术有限公司 A system and method for distributed resource requirements and allocation
CN111108480B (en) * 2017-09-19 2023-11-10 华为云计算技术有限公司 System and method for distributed resource demand and distribution
CN107992359A (en) * 2017-11-27 2018-05-04 江苏海平面数据科技有限公司 The task scheduling algorithm that cost perceives under a kind of cloud environment
CN107992359B (en) * 2017-11-27 2021-05-18 江苏海平面数据科技有限公司 Task scheduling method for cost perception in cloud environment
CN108021453A (en) * 2017-12-22 2018-05-11 联想(北京)有限公司 A kind of computing resource optimization method, device and server cluster
CN110046034A (en) * 2018-01-15 2019-07-23 北京国双科技有限公司 Task acquisition methods and device
CN108279980A (en) * 2018-01-22 2018-07-13 上海联影医疗科技有限公司 Resource allocation methods and system and resource allocation terminal
CN108196959A (en) * 2018-02-07 2018-06-22 聚好看科技股份有限公司 The method for managing resource and device of ETL system
CN108196959B (en) * 2018-02-07 2021-06-01 聚好看科技股份有限公司 Resource management method and device of ETL system
CN108536530A (en) * 2018-04-02 2018-09-14 北京中电普华信息技术有限公司 A kind of multithreading method for scheduling task and device
CN108536530B (en) * 2018-04-02 2021-10-22 北京中电普华信息技术有限公司 A kind of multi-thread task scheduling method and device
CN110633946A (en) * 2018-06-22 2019-12-31 西门子股份公司 task assignment system
CN108897619A (en) * 2018-06-27 2018-11-27 国家超级计算天津中心 A kind of multi-layer resource flexibility configuration method for supercomputer
CN108897619B (en) * 2018-06-27 2020-05-05 国家超级计算天津中心 Multi-level resource flexible configuration method for super computer
CN111475297B (en) * 2018-06-27 2023-04-07 国家超级计算天津中心 Flexible operation configuration method
CN111475297A (en) * 2018-06-27 2020-07-31 国家超级计算天津中心 Flexible operation configuration method
CN111045795A (en) * 2018-10-11 2020-04-21 浙江宇视科技有限公司 Resource scheduling method and device
CN111258745A (en) * 2018-11-30 2020-06-09 华为终端有限公司 Task processing method and device
CN111258745B (en) * 2018-11-30 2023-11-17 花瓣云科技有限公司 Task processing method and device
CN109947532A (en) * 2019-03-01 2019-06-28 中山大学 A big data task scheduling method in education cloud platform
CN112286676A (en) * 2019-07-25 2021-01-29 罗伯特·博世有限公司 Apparatus and computer-implemented method for planning resources
CN110399222A (en) * 2019-07-25 2019-11-01 北京邮电大学 GPU cluster deep learning task parallelization method, device and electronic equipment
CN110399222B (en) * 2019-07-25 2022-01-21 北京邮电大学 GPU cluster deep learning task parallelization method and device and electronic equipment
CN112395077A (en) * 2019-08-16 2021-02-23 阿里巴巴集团控股有限公司 Resource control method, device and system
CN110620818A (en) * 2019-09-18 2019-12-27 东软集团股份有限公司 Method, device and related equipment for realizing node distribution
CN111143057A (en) * 2019-12-13 2020-05-12 中国科学院深圳先进技术研究院 Heterogeneous cluster data processing method and system based on multiple data centers and electronic equipment
CN111143057B (en) * 2019-12-13 2024-04-19 中国科学院深圳先进技术研究院 Heterogeneous cluster data processing method and system based on multiple data centers and electronic equipment
CN113032113A (en) * 2019-12-25 2021-06-25 中科寒武纪科技股份有限公司 Task scheduling method and related product
CN111353696A (en) * 2020-02-26 2020-06-30 中国工商银行股份有限公司 Resource pool scheduling method and device
CN111459641A (en) * 2020-04-08 2020-07-28 广州欢聊网络科技有限公司 Cross-machine-room task scheduling and task processing method and device
CN111459641B (en) * 2020-04-08 2023-04-28 广州欢聊网络科技有限公司 Method and device for task scheduling and task processing across machine room
CN111539613B (en) * 2020-04-20 2023-09-15 浙江网商银行股份有限公司 Case distribution method and device
CN111539613A (en) * 2020-04-20 2020-08-14 浙江网商银行股份有限公司 Case distribution method and device
CN112099952A (en) * 2020-09-16 2020-12-18 亚信科技(中国)有限公司 Resource scheduling method, device, electronic device and storage medium
CN112099952B (en) * 2020-09-16 2024-12-31 亚信科技(中国)有限公司 Resource scheduling method, device, electronic device and storage medium
CN112272203B (en) * 2020-09-18 2022-06-14 苏州浪潮智能科技有限公司 A cluster service node selection method, system, terminal and storage medium
CN112272203A (en) * 2020-09-18 2021-01-26 苏州浪潮智能科技有限公司 Cluster service node selection method, system, terminal and storage medium
WO2022068643A1 (en) * 2020-09-29 2022-04-07 华为技术有限公司 Multi-task deployment method and apparatus
CN113448728A (en) * 2021-06-22 2021-09-28 腾讯科技(深圳)有限公司 Cloud resource scheduling method, device, equipment and storage medium
CN113986509A (en) * 2021-11-02 2022-01-28 中国建设银行股份有限公司 Resource object management method, device, electronic device and computer storage medium
CN114968570A (en) * 2022-05-20 2022-08-30 广东电网有限责任公司 Real-time computing system applied to digital power grid and working method thereof
CN114968570B (en) * 2022-05-20 2024-03-26 广东电网有限责任公司 Real-time computing system applied to digital power grid and working method thereof
WO2024125251A1 (en) * 2022-12-16 2024-06-20 华为技术有限公司 Resource allocation method and apparatus
CN118939427A (en) * 2024-08-06 2024-11-12 广州宏同信息技术有限公司 A resource allocation management method and computer equipment for cloud servers

Also Published As

Publication number Publication date
CN107045456B (en) 2020-03-10
WO2017133351A1 (en) 2017-08-10

Similar Documents

Publication Publication Date Title
CN107045456B (en) Resource allocation method and resource manager
US20190324819A1 (en) Distributed-system task assignment method and apparatus
Weng et al. Beware of fragmentation: Scheduling {GPU-Sharing} workloads with fragmentation gradient descent
CN109564528B (en) System and method for computing resource allocation in distributed computing
US10530846B2 (en) Scheduling packets to destination virtual machines based on identified deep flow
CN105718479B (en) Execution strategy generation method and device under cross-IDC big data processing architecture
CN108337109B (en) Resource allocation method and device and resource allocation system
US8869159B2 (en) Scheduling MapReduce jobs in the presence of priority classes
CN104780146B (en) Method for managing resource and device
US9141436B2 (en) Apparatus and method for partition scheduling for a processor with cores
US20130343399A1 (en) Offloading virtual machine flows to physical queues
CN104268018B (en) Job scheduling method and job scheduler in a kind of Hadoop clusters
CN110187960A (en) A distributed resource scheduling method and device
WO2015101091A1 (en) Distributed resource scheduling method and device
WO2016041446A1 (en) Resource allocation method, apparatus and device
JP5616523B2 (en) Information processing system
CN105677467A (en) Yarn resource scheduler based on quantified labels
CN114489978A (en) Resource scheduling method, device, equipment and storage medium
Fotohi et al. A cluster based job scheduling algorithm for grid computing
Tahir et al. UDRF: Multi-resource fairness for complex jobs with placement constraints
WO2019000435A1 (en) Task processing method and device, medium, and device thereof
Loganathan et al. Job scheduling with efficient resource monitoring in cloud datacenter
Sultana et al. Priority based scheduling algorithm using divisible load theory in cloud
CN105389212A (en) Job assigning method and apparatus
CN112527482A (en) Task management method and system based on mobile edge cloud platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210428

Address after: Unit 3401, unit a, building 6, Shenye Zhongcheng, No. 8089, Hongli West Road, Donghai community, Xiangmihu street, Futian District, Shenzhen, Guangdong 518040

Patentee after: Honor Device Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Unit 3401, unit a, building 6, Shenye Zhongcheng, No. 8089, Hongli West Road, Donghai community, Xiangmihu street, Futian District, Shenzhen, Guangdong 518040

Patentee after: Honor Terminal Co.,Ltd.

Country or region after: China

Address before: 3401, unit a, building 6, Shenye Zhongcheng, No. 8089, Hongli West Road, Donghai community, Xiangmihu street, Futian District, Shenzhen, Guangdong

Patentee before: Honor Device Co.,Ltd.

Country or region before: China