CN108320059B

CN108320059B - Workflow scheduling evolution optimization method and terminal equipment

Info

Publication number: CN108320059B
Application number: CN201810154127.7A
Authority: CN
Inventors: 封筠; 党云龙; 綦朝晖; 殷梦莹
Original assignee: Shijiazhuang Tiedao University
Current assignee: Shijiazhuang Tiedao University
Priority date: 2018-02-22
Filing date: 2018-02-22
Publication date: 2020-06-09
Anticipated expiration: 2038-02-22
Also published as: CN108320059A

Abstract

The invention is applicable to the technical fields of cloud computing and artificial intelligence, and provides a workflow scheduling evolutionary optimization method and terminal equipment. The workflow scheduling evolutionary optimization method includes: designing a multi-chromosome pair coding structure; constructing a balanced mixed initialization population; adaptively calculating relevant variables; performing a re-evolution operation; and optimizing a scheduling strategy. The above-mentioned workflow scheduling evolutionary optimization method can provide a scheduling strategy that meets user needs with a high deadline constraint satisfaction rate and low execution cost, and provides a reference implementation for building a workflow scheduling evolutionary optimization system under deadline constraints. .

Description

A workflow scheduling evolutionary optimization method and terminal device

技术领域technical field

本发明属于云计算和人工智能技术领域，尤其涉及一种工作流调度进化寻优方法及终端设备。The invention belongs to the technical field of cloud computing and artificial intelligence, and in particular relates to a workflow scheduling evolutionary optimization method and terminal equipment.

背景技术Background technique

云计算中的任务越来越多地以工作流形式存在。这种工作流通常是数据密集型和计算密集型应用。由于工作流具有大量的数据和计算需求，所以需要云计算提供高性能计算资源。云计算将计算资源作为服务，根据用户的需求以虚拟机的形式动态地提供给用户。云计算提供资源服务的模式主要包含：基础设施即服务(IaaS)、平台即服务(PaaS)和软件即服务(SaaS)。在基础设施云以虚拟资源池的方式为用户提供服务，通过调度技术为工作流分配计算资源成为云计算、人工智能领域的研究重点。Tasks in cloud computing increasingly exist in the form of workflows. Such workflows are typically data-intensive and computationally-intensive applications. Since the workflow has a large amount of data and computing requirements, cloud computing is required to provide high-performance computing resources. Cloud computing takes computing resources as a service and dynamically provides them to users in the form of virtual machines according to their needs. The modes of cloud computing to provide resource services mainly include: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). The infrastructure cloud provides services to users in the form of virtual resource pools, and allocating computing resources to workflows through scheduling technology has become a research focus in the field of cloud computing and artificial intelligence.

云环境下工作流调度是指在特定云环境下，按照用户的计算需求，为工作流中不同的计算任务匹配不同的计算资源，并寻找到最优匹配方案的过程。工作流调度时对任务进行资源分配和寻优，通过提取工作流任务特征及相互数据依赖关系，得到工作流问题的公式化描述，再对其进行建模，为其分配计算资源，确定分配策略满足用户需求，并寻优。工作流调度试图建立计算任务、计算资源，以及用户需求之间的对应关系，度量其适应程度。调度用于计算任务与计算资源之间的映射，例如由不同计算规模的计算任务与不同计算性能的计算资源之间执行时间和执行花费的变化，找出目标优化项变化的轨迹。在云计算工作流调度系统中，工作流内容通常用节点特征进行描述。事实上，基于云计算的工作流可以分为三个步骤：工作流建模、任务资源映射以及策略寻优。Workflow scheduling in the cloud environment refers to the process of matching different computing resources for different computing tasks in the workflow and finding the optimal matching solution according to the computing needs of users in a specific cloud environment. During workflow scheduling, tasks are allocated and optimized. By extracting the characteristics of workflow tasks and their mutual data dependencies, a formulaic description of the workflow problem is obtained, and then it is modeled, and computing resources are allocated to it, and the allocation strategy is determined to satisfy User needs and optimization. Workflow scheduling attempts to establish the corresponding relationship between computing tasks, computing resources, and user requirements, and measure its adaptability. Scheduling is used to map between computing tasks and computing resources. For example, from the changes in execution time and execution cost between computing tasks with different computing scales and computing resources with different computing performance, find out the trajectory of the change of the target optimization item. In the cloud computing workflow scheduling system, the workflow content is usually described by node characteristics. In fact, the workflow based on cloud computing can be divided into three steps: workflow modeling, task resource mapping and policy optimization.

工作流调度是人工智能的一个重要应用，主要目的是为工作流中计算任务分配所需计算资源并加以调整，也可称为约束条件下资源调度的优化问题。资源调度主要是对计算资源池中计算资源进行分配。通常，约束条件下资源调度优化问题处理是使用惩罚函数惩罚不合适的调度策略，将约束问题转换为无约束问题。除了传统调度算法的不断优化和改进之外，一些进化算法也被采用来生成近似最优的调度策略。Rodriguez和Buyya针对云环境下科学工作流调度提出了一种基于元启发优化技术的粒子群优化算法，该调度算法在满足截止时间约束条件下使整体执行花费最小，充分考虑了云计算中IaaS的一些基本特性，如计算资源的弹性化与异构化。但算法中使用不满足约束条件的粒子不如合法粒子的惩罚函数，可能会导致算法过早收敛而陷入局部最优。Li Liu等人提出了一种基于截止时间约束的协同遗传算法CGA²，保证了在满足截止时间约束条件下，使总体执行花费最小，能够加速收敛并能避免早熟。但该算法中引入协同演化操作会增大计算量，同时由于其本质是一种贪心策略会在某些情况下无法按约束需求得到最优值。Workflow scheduling is an important application of artificial intelligence. The main purpose is to allocate and adjust the required computing resources for computing tasks in the workflow. It can also be called the optimization problem of resource scheduling under constraints. Resource scheduling is mainly to allocate computing resources in the computing resource pool. Usually, the resource scheduling optimization problem under constrained conditions is to use a penalty function to punish inappropriate scheduling strategies and convert the constrained problem into an unconstrained problem. In addition to the continuous optimization and improvement of traditional scheduling algorithms, some evolutionary algorithms have also been adopted to generate near-optimal scheduling policies. Rodriguez and Buyya proposed a particle swarm optimization algorithm based on meta-heuristic optimization technology for scientific workflow scheduling in cloud environment. Some basic features, such as the elasticity and heterogeneity of computing resources. However, the use of a penalty function in the algorithm that the particles that do not meet the constraints are not as good as the legitimate particles may cause the algorithm to converge prematurely and fall into a local optimum. Li Liu et al. proposed a cooperative genetic algorithm CGA ² based on deadline constraints, which ensures that the overall execution cost is minimized under the conditions of meeting deadline constraints, which can accelerate convergence and avoid premature maturity. However, the introduction of co-evolutionary operations in the algorithm will increase the amount of computation, and because it is a greedy strategy in nature, the optimal value cannot be obtained according to the constraint requirements in some cases.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明实施例提供了一种工作流调度进化寻优方法及终端设备，以解决现有技术中截止时间约束满足率较低和执行花费较高的问题。In view of this, embodiments of the present invention provide an evolutionary optimization method and terminal device for workflow scheduling, so as to solve the problems of low deadline constraint satisfaction rate and high execution cost in the prior art.

本发明实施例的第一方面，提供了一种工作流调度进化寻优方法，包括：A first aspect of the embodiments of the present invention provides an evolutionary optimization method for workflow scheduling, including:

步骤S1，根据用户输入参数和工作流数据，选取遗传算法种群中染色体作为调度策略信息载体，结合种群规模M、计算资源规模N以及工作流数据设计多重染色体对W的信息编码结构；所述用户输入参数包括截止时间约束参数；Step S1, according to the user input parameters and workflow data, select chromosomes in the genetic algorithm population as the scheduling strategy information carrier, and design the information encoding structure of the multiple chromosome pair W in combination with the population size M, the computing resource size N and the workflow data; the user Input parameters include deadline constraint parameters;

步骤S2，根据所述信息编码结构构造的任一多重染色体对，按照构造方法循环单个多种染色体对构造过程，形成规模为M的均衡混合初始化种群；Step S2, according to any multiple chromosome pair constructed by the information coding structure, cycle the construction process of a single multiple chromosome pair according to the construction method to form a balanced mixed initialization population with a scale of M;

步骤S3，根据所述均衡混合初始化种群，构建进化过程，并依据所述用户输入参数调整进化方向，按照所述用户输入参数以及原始适应度修正均值构造改进适应度，并根据所述适应度计算进化过程中种群每条染色体的自适应交叉概率Pc和自适应变异概率Pm，将每条染色体进行基因操作并将所得调度策略存入调度寻优策略池A；Step S3, initialize the population according to the balanced mixture, construct an evolution process, adjust the evolution direction according to the user input parameters, construct an improved fitness according to the user input parameters and the original fitness correction mean, and calculate according to the fitness During the evolution process, the adaptive crossover probability Pc and the adaptive mutation probability Pm of each chromosome of the population are genetically manipulated for each chromosome and the resulting scheduling strategy is stored in the scheduling optimization strategy pool A;

步骤S4，将优选策略池A中的ω条染色体作为截止时间约束下再进化操作的输入，并加入M-ω条随机生成的染色体，获得再进化种群作为步骤S3的输入并按照既定规则迭代执行再进化，所得调度策略均存入调度寻优策略池B；Step S4, the ω chromosomes in the optimal strategy pool A are used as the input of the re-evolution operation under the deadline constraint, and M-ω randomly generated chromosomes are added to obtain the re-evolved population as the input of step S3 and iteratively execute according to the established rules. After further evolution, the obtained scheduling strategies are stored in the scheduling optimization strategy pool B;

步骤S5，综合评价调度寻优策略池A和调度寻优策略池B的每条调度策略，匹配截止时间约束条件，并对比截止时间约束满足率和执行花费，评价算法执行效率，最终判定优选调度策略。Step S5, comprehensively evaluate each scheduling strategy of the scheduling optimization strategy pool A and the scheduling optimization strategy pool B, match the deadline constraints, and compare the deadline constraint satisfaction rate and execution cost, evaluate the algorithm execution efficiency, and finally determine the optimal scheduling. Strategy.

可选的，步骤S1中的多重染色体对W的信息编码结构共包含2条染色体，即Optionally, the information coding structure of the multiple chromosome pair W in step S1 contains a total of 2 chromosomes, that is,

多重染色体对W＝{chromesome1,chromesome2}Multiple chromosome pair W={chromesome1,chromesome2}

chromesome1＝sequence of{Pc,Pm,gen₀,gen₁,…,gen_k,…,gen_L-1}chromesome1=sequence of{Pc,Pm,gen ₀ ,gen ₁ ,…,gen _k ,…,gen _L-1 }

chromesome2＝{tof₀,tof₁,…,tof_i,…,tof_N-1}chromesome2={tof ₀ ,tof ₁ ,…,tof _i ,…,tof _N-1 }

假设第i台虚拟机上，执行序列的规模为g，则tof_i可表示为：Assuming that on the ith virtual machine, the size of the execution sequence is g, then tof _i can be expressed as:

其中，Pc为染色体的自适应交叉概率；Pm为染色体的自适应变异概率；gen_k为染色体的k位置上的基因，0≤k<L；vm_i为计算资源中第i号虚拟机，0≤i<N；tof_i为分配到计算资源vm_i的计算任务执行序列；chromesome1染色体中基因gen的编码为t(vm)，表示计算任务t分配计算资源vm；chromesome2染色体中基因vm表示计算资源，后续基因集合{t}表示分配在该计算资源vm的计算任务。Among them, Pc is the adaptive crossover probability of the chromosome; Pm is the adaptive mutation probability of the chromosome; gen _k is the gene at the k position of the chromosome, 0≤k<L; vm _i is the ith virtual machine in the computing resources, 0 ≤i<N; tof _i is the execution sequence of the computing task assigned to the computing resource vm _i ; the gene gen in the chromesome1 chromosome is coded as t(vm), indicating that the computing task t allocates the computing resource vm; the gene vm in the chromesome2 chromosome represents the computing resource , and the subsequent gene set {t} represents the computing task allocated to the computing resource vm.

可选的，根据所述信息编码结构构造的任一多重染色体对，按照构造方法循环单个多种染色体对构造过程，形成规模为M的均衡混合初始化种群，具体过程如下：Optionally, according to any multiple chromosome pair constructed by the information coding structure, cycle the construction process of a single multiple chromosome pair according to the construction method to form a balanced mixed initialization population with a scale of M. The specific process is as follows:

按照步骤S1得到的染色体对编码结构，假设使用k种方法构造初始化种群，则按照方法1构造x₁条染色体对，按照方法i构造x_i条染色体对，按照方法k构造x_k条染色体对，得到规模为M的均衡混合初始种群；其中，k≥2，

According to the chromosome pair coding structure obtained in step S1, assuming that k methods are used to construct the initialization population, then construct x ₁ chromosome pairs according to method 1, construct x _i chromosome pairs according to method i, and construct x _k chromosome pairs according to method k, An equilibrium mixed initial population of size M is obtained; where k ≥ 2,

可选的，按照所述用户输入参数以及原始适应度修正均值构造改进适应度，并根据所述适应度计算进化过程中种群每条染色体的自适应交叉概率Pc和自适应变异概率Pm，具体过程如下：Optionally, the improved fitness is constructed according to the user input parameters and the original fitness correction mean, and the adaptive crossover probability Pc and the adaptive mutation probability Pm of each chromosome of the population in the evolution process are calculated according to the fitness. The specific process as follows:

对于所述均衡混合初始化种群，按照截止时间约束计算染色体的原始适应度向量F_ini，按原始适应度修正均值，构造改进适应度向量F_imp；For the balanced mixed initialization population, the original fitness vector F _ini of the chromosome is calculated according to the deadline constraint, the mean value is corrected according to the original fitness, and the improved fitness vector F _imp is constructed;

利用最大改进适应度值减去当前改进适应度值，并除以最大改进适应度值与改进适应度修正均值的差，得到概率中间变量V_mi；Use the maximum improved fitness value to subtract the current improved fitness value, and divide by the difference between the maximum improved fitness value and the modified average value of the improved fitness to obtain the probability intermediate variable V _mi ;

对改进适应度值小于其修正均值时，概率修正变量取值为1；When the improved fitness value is less than its revised mean, the probability correction variable takes the value of 1;

对改进适应度值与其修正均值的关系，分别与不同参数相乘，标记为Pc；Multiply the relationship between the improved fitness value and its revised mean value with different parameters and mark it as Pc;

取M个目标染色体样本，重复以上过程得到M个Pc值；Take M target chromosome samples, and repeat the above process to obtain M Pc values;

基于同样的过程，可得Pm。Based on the same process, Pm can be obtained.

可选的，步骤S5中判定优选调度策略的过程具体如下：Optionally, the process of determining the preferred scheduling policy in step S5 is as follows:

对种群中所有染色体执行步骤S2、步骤S3以及步骤S4获得调度寻优策略池A和调度寻优策略池B，以截止时间约束满足率为评价标准，对比步骤S4和步骤S5所获得调度寻优策略池A和调度寻优策略池B；Perform steps S2, S3 and S4 on all chromosomes in the population to obtain the scheduling optimization strategy pool A and the scheduling optimization strategy pool B, and use the deadline constraint satisfaction rate as the evaluation criterion to compare the scheduling optimization obtained in steps S4 and S5. Strategy pool A and scheduling optimization strategy pool B;

根据所述用户输入信息，确定次要约束；所述用户输入信息还包括次要约束信息；determining secondary constraints according to the user input information; the user input information further includes secondary constraint information;

若某一调度寻优策略适应度最优且不违反次要截止时间约束，或违反截止时间约束程度相对其他调度寻优策略最小，则判定该调度寻优策略作为目标调度寻优策略，并给出调度结果。If the fitness of a scheduling optimization strategy is optimal and does not violate the secondary deadline constraint, or the degree of violation of the deadline constraint is the smallest relative to other scheduling optimization strategies, the scheduling optimization strategy is determined as the target scheduling optimization strategy, and given output the scheduling result.

本发明实施例的第二方面，提供了一种终端设备，包括存储器、处理器，所述存储器中存储有可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如下步骤：In a second aspect of the embodiments of the present invention, a terminal device is provided, including a memory and a processor, where the memory stores a computer program that can run on the processor, and when the processor executes the computer program Implement the following steps:

本发明实施例的第三方面，提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如本发明实施例第一方面所述的任一方法的步骤。In a third aspect of the embodiments of the present invention, a computer-readable storage medium is provided, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the implementation of the first aspect of the embodiments of the present invention is implemented. steps of any of the methods described.

本发明实施例，针对工作流任务集合的特点，基于遗传算法提出一种再进化(Re-Evolution)策略来解决局部最优和收敛速度慢的问题，并提出一种多种染色体对编码结构，可根据工作流任务节点信息充分描述工作流调度策略，进而通过自适应计算截止时间约束下修正均值的相关变量来保证有效基因的存活率，从两个调度寻优资源池中按照既定规则优选最终调度策略。与现有基于进化算法的调度方法对比，该方法减少了计算量，能保证特定应用场景约束需要的满足率，并保证算法可控的执行效率，可避免局部最优和收敛速度慢等问题。In the embodiment of the present invention, in view of the characteristics of the workflow task set, a Re-Evolution strategy is proposed based on the genetic algorithm to solve the problems of local optimization and slow convergence, and a multi-chromosome pair coding structure is proposed, The workflow scheduling strategy can be fully described according to the workflow task node information, and then the survival rate of effective genes can be guaranteed by adaptively calculating the relevant variables of the mean value under the constraint of the deadline, and the final optimization can be selected from the two scheduling optimization resource pools according to the established rules. scheduling strategy. Compared with the existing scheduling methods based on evolutionary algorithms, this method reduces the amount of computation, can ensure the satisfaction rate required by the constraints of specific application scenarios, and ensure the controllable execution efficiency of the algorithm, and can avoid problems such as local optimization and slow convergence.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only for the present invention. In some embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1是本发明实施例提供的工作流调度进化寻优方法框图；1 is a block diagram of a workflow scheduling evolutionary optimization method provided by an embodiment of the present invention;

图2是本发明实施例提供的工作流调度进化寻优方法流程图；2 is a flowchart of a method for evolutionary optimization of workflow scheduling provided by an embodiment of the present invention;

图3是本发明实施例提供的科学工作流结构示意图；3 is a schematic diagram of a scientific workflow structure provided by an embodiment of the present invention;

图4是本发明实施例提供的工作流调度进化寻优程序的运行环境示意图；4 is a schematic diagram of a running environment of a workflow scheduling evolutionary optimization program provided by an embodiment of the present invention;

图5是本发明实施例提供的工作流调度进化寻优程序的程序模块图。FIG. 5 is a program module diagram of a workflow scheduling evolution optimization program provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明是针对工作流任务集合，构建一种基于遗传算法(Genetic Algorithm，GA)的工作流调度进化寻优方法，设计描述多重染色体对编码结构，建立高效的染色体对，构造均衡混合初始化种群，自适应计算基于当代种群适应度的相关变量，设计再进化操作，为工作流调度策略寻优以及收敛加速提供参考实现。The invention constructs a workflow scheduling evolutionary optimization method based on a genetic algorithm (Genetic Algorithm, GA) for the workflow task set, designs and describes the coding structure of multiple chromosome pairs, establishes efficient chromosome pairs, and constructs a balanced mixed initialization population. Based on the relevant variables of the fitness of the contemporary population, the adaptive calculation designs the re-evolution operation, which provides a reference implementation for the optimization of the workflow scheduling strategy and the acceleration of the convergence.

本发明是根据用户提供的截止时间约束，自适应计算相关变量，构建进化过程与再进化过程，综合评定调度策略优劣，并寻优最终结果的计算方法。针对工作流任务集合中任务节点的信息特征，提出了多重染色体对编码结构、均衡混合初始化种群、计算截止时间约束下自适应修正均值的相关变量、再进化策略、调度寻优策略资源池及寻优调度策略等概念，设计了一种新的截止时间约束下的工作流调度进化寻优方法。The present invention is a calculation method for self-adaptively calculating relevant variables, constructing an evolutionary process and a re-evolutionary process, comprehensively evaluating the merits and demerits of scheduling strategies, and optimizing the final result according to the deadline constraint provided by the user. According to the information characteristics of the task nodes in the workflow task set, the coding structure of multiple chromosome pairs, balanced mixed initialization population, relevant variables for calculating the adaptive mean value under the constraint of deadline, re-evolution strategy, scheduling optimization strategy resource pool and search method are proposed. Based on concepts such as optimal scheduling strategy, a new evolutionary optimization method for workflow scheduling under the constraint of deadline is designed.

为了说明本发明所述的技术方案，下面通过具体实施例来进行说明。In order to illustrate the technical solutions of the present invention, the following specific embodiments are used for description.

为了准确说明本发明实施例，首先解释如下术语及含义。In order to accurately describe the embodiments of the present invention, the following terms and meanings are first explained.

任务：指调度进化寻优方法所调度的基本单元。在本发明中使用染色体中基因表示任务。任务具有任务编号、规模、接收数据规模、发送数据规模，以及依赖关系等信息。Task: refers to the basic unit scheduled by the scheduling evolutionary optimization method. The task of gene representation in chromosomes is used in the present invention. The task has information such as task number, scale, received data scale, sent data scale, and dependencies.

资源：资源用于计算任务，在本发明中特指虚拟机。一般地，虚拟机具有虚拟机编号、性能、状态，以及价格等信息。Resource: A resource is used for computing tasks, and in the present invention refers specifically to a virtual machine. Generally, a virtual machine has information such as virtual machine number, performance, status, and price.

工作流：任务集合的一种形式。一般地，工作流为包含多个任务及其依赖关系的集合，其中科学工作流是工作流中较为重要复杂的一种。Workflow: A form of collection of tasks. Generally, a workflow is a collection including multiple tasks and their dependencies, and a scientific workflow is a more important and complex one in the workflow.

染色体对：染色体对在本发明中特指所设计的多重染色体对，用来描述工作的主要特征，染色体对中的不同染色体在不同基因操作阶段起到不同作用，亦称为个体。Chromosome pair: In the present invention, chromosome pair specifically refers to the designed multiple chromosome pair, which is used to describe the main features of the work. Different chromosomes in the chromosome pair play different roles in different stages of gene manipulation, and are also called individuals.

自适应计算相关变量：调度进化寻优方法中自适应变量，如改良适应度、交叉概率和变异概率、选择概率以及惩罚函数等。通过自适应计算相关变量可避免计算资源异构性和动态性等特性造成的早熟和局部收敛的不足。Adaptive calculation of related variables: adaptive variables in scheduling evolutionary optimization methods, such as improved fitness, crossover and mutation probability, selection probability, and penalty function. The deficiency of premature and local convergence caused by the heterogeneity and dynamics of computing resources can be avoided by adaptively calculating relevant variables.

基因操作：基因操作主要指本发明进化过程中的交叉、变异及选择等操作。Gene manipulation: Gene manipulation mainly refers to operations such as crossover, mutation and selection in the evolutionary process of the present invention.

调度寻优策略资源池：存储调度策略，为再进化和优选调度策略提供支持。Scheduling optimization strategy resource pool: stores scheduling strategies to provide support for re-evolution and optimal scheduling strategies.

约束：指用户提出的需求，将其作为主要约束条件。本发明以截止时间为主要约束条件，以执行花费为次要约束条件。Constraints: Refers to the requirements put forward by users as the main constraints. The present invention takes the deadline as the main constraint, and takes the execution cost as the secondary constraint.

约束满足率：对比约束项与约束条件，反复执行若干次，统计满足约束的比率，该比率称为约束满足率，约束满足率越高，相应调度策略越优秀。Constraint Satisfaction Rate: Compare the constraint item with the constraint condition, execute it several times, and count the ratio of satisfying the constraint. This ratio is called the constraint satisfaction rate. The higher the constraint satisfaction rate, the better the corresponding scheduling strategy.

适应度：判定调度策略的优劣，是决定进化方向的关键。Fitness: Determining the pros and cons of scheduling strategies is the key to determining the direction of evolution.

再进化：在当前进化所得结果基础上，按照既定规则重新构建再进化种群，再次进化，将再进化结果存入相应调度寻优策略资源池，可避免普通进化结果中陷入局部最优以及早熟的问题。Re-evolution: On the basis of the current evolution results, rebuild the re-evolutionary population according to the established rules, evolve again, and store the re-evolution results in the corresponding scheduling optimization strategy resource pool, which can avoid falling into local optimum and premature evolution in ordinary evolution results. question.

实施例一Example 1

图1与图2分别示出了本发明实施例一提供的工作流调度进化寻优方法的基本构成与实现流程，详述如下：FIG. 1 and FIG. 2 respectively show the basic structure and implementation process of the workflow scheduling evolutionary optimization method provided in Embodiment 1 of the present invention, and the details are as follows:

步骤S1，根据用户输入参数和工作流数据，选取遗传算法种群中染色体作为调度策略信息载体，结合种群规模M、计算资源规模N以及工作流数据设计多重染色体对W的信息编码结构；所述用户输入参数包括截止时间约束参数。Step S1, according to the user input parameters and workflow data, select chromosomes in the genetic algorithm population as the scheduling strategy information carrier, and design the information encoding structure of the multiple chromosome pair W in combination with the population size M, the computing resource size N and the workflow data; the user Input parameters include deadline constraint parameters.

本步骤中，多重染色体对编码结构中包含了多个染色体对，不同染色体中包含着不同信息。在本发明公布的调度进化寻优方法过程中不再依赖工作流数据，而使用多重染色体对描述表示工作流中任务及相关信息。本发明公布的方法选取从工作流提取的任务编号(0-(L-1))、任务数量(L)；资源池中虚拟机的编号(0-(N-1))、数量(N)；交叉概率和变异概率等信息，设计多重染色体对W编码结构。In this step, the multiple chromosome pair coding structure includes multiple chromosome pairs, and different chromosomes contain different information. In the process of the scheduling evolution optimization method disclosed in the present invention, the workflow data is no longer relied on, and multiple chromosome pairs are used to describe the tasks and related information in the workflow. The method disclosed in the present invention selects the task number (0-(L-1)) and the number of tasks (L) extracted from the workflow; the number (0-(N-1)) and the number (N) of the virtual machines in the resource pool ; Crossover probability and mutation probability and other information, design multiple chromosome pairs W coding structure.

其中，本步骤中的多重染色体对W的信息编码结构共包含2条染色体，即Wherein, the information coding structure of the multiple chromosome pair W in this step includes 2 chromosomes in total, namely

步骤S2，根据所述信息编码结构构造的任一多重染色体对，按照构造方法循环单个多种染色体对构造过程，形成规模为M的均衡混合初始化种群。Step S2, according to any multiple chromosome pair constructed by the information coding structure, cycle the construction process of a single multiple chromosome pair according to the construction method to form a balanced mixed initialization population with a scale of M.

作为一种可实施方式，本步骤的具体过程如下：As a kind of embodiment, the concrete process of this step is as follows:

设按照步骤(1)所提出编码结构构造任一多重染色体对a；按照多种构造方法循环染色体对构造过程，均衡混合初始化种群为B。如果种群B有M个染色体对，设共有k种方法；方法1构造的染色体对集合为B1；方法2构造的染色体对集合为B2；…；方法k构造的染色体对集合为Bk；则初始化种群B由k个子种群组成。按照方法1构造x₁条染色体对，方法2构造x₂条染色体对，…，方法i构造x_i条染色体对，...，方法k构造x_k条染色体对，得到规模为M的均衡混合初始种群。Suppose that any multiple chromosome pair a is constructed according to the coding structure proposed in step (1); If there are M chromosome pairs in population B, there are k methods in total; the set of chromosome pairs constructed by method 1 is B1; the set of chromosome pairs constructed by method 2 is B2; ...; the set of chromosome pairs constructed by method k is Bk; then initialize the population B consists of k subpopulations. Follow method 1 to construct x ₁ chromosome pairs, method 2 to construct x ₂ chromosome pairs, ..., method i to construct x _i chromosome pairs, ..., method k to construct x _k chromosome pairs, resulting in a balanced mixture of size M initial population.

设方法1所构成的染色体对集合

Let the set of chromosome pairs constituted by method 1

方法2所构成的染色体对集合

The set of chromosome pairs formed by method 2

依次类推，方法k所构成的染色体对集合为

By analogy, the set of chromosome pairs formed by method k is

因此，均衡混合初始化种群B的构成为：B＝{B1,B2,...,Bk}。Therefore, the composition of the balanced mixed initialization population B is: B={B1,B2,...,Bk}.

步骤S3，根据所述均衡混合初始化种群，构建进化过程，并依据所述用户输入参数调整进化方向，按照所述用户输入参数以及原始适应度修正均值构造改进适应度，并根据所述适应度计算进化过程中种群每条染色体的自适应交叉概率Pc和自适应变异概率Pm，将每条染色体进行基因操作并将所得调度策略存入调度寻优策略池A。Step S3, initialize the population according to the balanced mixture, construct an evolution process, adjust the evolution direction according to the user input parameters, construct an improved fitness according to the user input parameters and the original fitness correction mean, and calculate according to the fitness In the evolution process, the adaptive crossover probability Pc and adaptive mutation probability Pm of each chromosome of the population are genetically manipulated for each chromosome and the resulting scheduling strategy is stored in the scheduling optimization strategy pool A.

可选的，所述按照所述用户输入参数以及原始适应度修正均值构造改进适应度，并根据所述适应度计算进化过程中种群每条染色体的自适应交叉概率Pc和自适应变异概率Pm，具体过程如下：Optionally, the improved fitness is constructed according to the user input parameter and the original fitness correction mean, and the adaptive crossover probability Pc and the adaptive mutation probability Pm of each chromosome of the population in the evolution process are calculated according to the fitness, The specific process is as follows:

具体的，specific,

其中，P_c(i)表示当代种群中第i条染色体对的交叉概率，P_m(i)表示当代种群中第i条染色体对的变异概率。在式(1)和(2)，当第i条染色体对适应度f(i)小于等于当前种群适应度的修正均值f_cavg时，认为该染色体对处于进化过程中的Maturing和Matured阶段，使用交叉参数k₃和变异参数k₄；当第i条染色体对适应度f(i)大于等于当前种群适应度的修正均值f_cavg时，认为该染色体对处于进化过程中的Initial和Undermatured阶段，使用交叉参数k₁和变异参数k₂。为了避免突变产生的极差染色体对对种群整体适应度均值的影响，本实施例采用λ修正均值f_cavg，其计算如式(3)所示，其中λ表示所需要修正所占种群的百分比，仅用于修正极差染色体对。Among them, P _c (i) represents the crossover probability of the ith chromosome pair in the contemporary population, and P _m (i) represents the mutation probability of the ith chromosome pair in the contemporary population. In equations (1) and (2), when the fitness f(i) of the i-th chromosome pair is less than or equal to the revised mean value f _cavg of the fitness of the current population, it is considered that the chromosome pair is in the Matured and Matured stages in the evolutionary process, using Crossover parameter k ₃ and mutation parameter k ₄ ; when the fitness f(i) of the i-th chromosome pair is greater than or equal to the revised mean value f _cavg of the fitness of the current population, it is considered that the chromosome pair is in the Initial and Undermatured stages in the evolution process, using Crossover parameter k ₁ and mutation parameter k ₂ . In order to avoid the influence of extremely poor chromosomes caused by mutation on the overall fitness average of the population, this embodiment adopts λ to correct the mean value f _cavg . Only used to correct for very poor chromosome pairs.

式(3)中，S_p表示当前种群的大小，f_c(i)表示修正后的种群染色体对适应度。In formula (3), S _p represents the size of the current population, and f _c (i) represents the fitness of the corrected population chromosome pair.

步骤S4，将优选策略池A中的ω条染色体作为截止时间约束下再进化操作的输入，并加入M-ω条随机生成的染色体，获得再进化种群作为步骤S3的输入并按照既定规则迭代执行再进化，所得调度策略均存入调度寻优策略池B。Step S4, the ω chromosomes in the optimal strategy pool A are used as the input of the re-evolution operation under the deadline constraint, and M-ω randomly generated chromosomes are added to obtain the re-evolved population as the input of step S3 and iteratively execute according to the established rules. After further evolution, the obtained scheduling strategies are stored in the scheduling optimization strategy pool B.

本步骤中，按照既定规则将调度寻优策略资源池A中的ω条染色体作为再进化操作的输入，并加入(M-ω)条随机生成的染色体，获得再进化种群作为步骤S3的输入，按照既定规则迭代再进化代数，所得染色体均存入调度寻优策略资源池B，为步骤S5提供调度策略备用资源。In this step, according to the established rules, the ω chromosomes in the resource pool A of the scheduling optimization strategy are used as the input of the re-evolution operation, and (M-ω) randomly generated chromosomes are added to obtain the re-evolved population as the input of step S3, Iteratively re-evolves the algebra according to the established rules, and the obtained chromosomes are stored in the scheduling optimization strategy resource pool B to provide the scheduling strategy spare resources for step S5.

设调度寻优策略资源池A中的存储种群为S_A：S_A＝{a₀,a₁,...,a_M-1}；Let the storage population in the scheduling optimization strategy resource pool A be S _A : S _A ={a ₀ ,a ₁ ,...,a _M-1 };

按照既定规则选取的部分种群为S₁：S₁＝{a_i,a_i+1,...a_i+ω}；The partial population selected according to the established rules is S ₁ : S ₁ ={a _i ,a _i+1 ,...a _i+ω };

新生成的部分种群为S₂：S₂＝{a₀,a₁,...,a_M-1-ω}；The newly generated partial population is S ₂ : S ₂ ={a ₀ ,a ₁ ,...,a _M-1-ω };

因此，所得进化种群为S_new：Therefore, the resulting evolutionary population is S _new :

S_new＝{S₁,S₂}S _new ={S ₁ ,S ₂ }

可选的，本步骤中判定优选调度策略的过程具体如下：Optionally, the process of determining the preferred scheduling policy in this step is as follows:

具体的，设再进化次数为z；再进化所成的种群为Sr；调度寻优策略池B中的存储种群集合为S_B：S_B＝{S_r1,S_r2,...,S_rz}；设约束满足率最高的染色体个数为h个，则集合表示为M_B：M_B＝{m₀,m₁,...,m_h-1}；设执行花费最低的染色体个数为e个，则集合表示为C_B：C_B＝{c₀,c₁,...,c_e-1}。Specifically, let the number of re-evolution be z; the population formed by re-evolution is Sr; the storage population set in the scheduling optimization strategy pool B is S _B : S _B ={S _r1 ,S _r2 ,...,S _rz }; Let the number of chromosomes with the highest constraint satisfaction rate be h, then the set is represented as M _B : M _B ={m ₀ ,m ₁ ,...,m _h-1 }; Let the number of chromosomes with the lowest execution cost is e, then the set is represented as C _B : C _B ={c ₀ ,c ₁ ,...,c _e-1 }.

对比种群集合S_B和种群S_A中的每一条染色体，首先对比约束满足率，找到其中约束满足率最高的所有染色体，将这些染色体存入约束满足率染色体集合M_B；如果M_B的规模大于1，则遍历集合M_B中所有染色体，然后对比执行花费，找到执行花费最低的所有染色体，存入执行花费染色体集合C_B；如果C_B的规模大于1，则遍历集合C_B中所有染色体，再对比方法效率，找到效率最高的所有染色体，如果染色体条数大于1，则随机选取一条染色体作为结果输出。Compare each chromosome in the population set _S _B and the population SA, first compare the constraint satisfaction rate, find all the chromosomes with the highest constraint satisfaction rate, and store these chromosomes in the constraint satisfaction rate chromosome set _MB ; if the scale of _MB is greater than ₁ , then traverse all chromosomes in the set MB, then compare the execution costs, find all the chromosomes with the lowest execution cost, and store them in the execution cost chromosome set _CB ; if the scale of _CB is greater than 1, traverse all the chromosomes in the set _CB , Then compare the efficiency of the method and find all the chromosomes with the highest efficiency. If the number of chromosomes is greater than 1, a chromosome is randomly selected as the result output.

下面以具体算例说明本实施例的过程。The process of this embodiment is described below with a specific calculation example.

实验采用科学工作流montage-25，其中25个任务节点，其结构示意图如图3所示；并使用5台不同类型的虚拟机进行模拟，虚拟机参数设置参照Amazon EC2配置。The experiment adopts the scientific workflow montage-25, in which there are 25 task nodes, the schematic diagram of which is shown in Figure 3; and 5 different types of virtual machines are used for simulation, and the virtual machine parameter settings refer to the Amazon EC2 configuration.

(1)设计多重染色体对编码结构(1) Design multiple chromosome pair coding structures

设多重染色体对描述工作流任务为集合W。限于篇幅，仅列举其中第0个染色体对的数据，Let the multiple chromosome pair description workflow task be set W. Due to space limitations, only the data of the 0th chromosome pair is listed.

W₀＝{chromesome1,chromesome2}W ₀ = {chromesome1,chromesome2}

chromesome1＝sequence of{0.45,0.011,T₄(0),...T₂₄(4)}chromesome1=sequence of{ _0.45,0.011 ,T4(0),...T24( ₄ )}

chromesome2＝sequence of{vm₀,T₄,T₁,T₁₂,T₁₆,T₂₀,T₂₁,T₂₃}chromesome2=sequence of{vm ₀ ,T ₄ ,T ₁ ,T ₁₂ ,T ₁₆ ,T ₂₀ ,T ₂₁ ,T ₂₃ }

... ...

sequence of{vm₄,T₇,T₁₀,T₁₁,T₁₇,T₂₄}sequence of{vm ₄ ,T ₇ ,T ₁₀ ,T ₁₁ ,T ₁₇ ,T ₂₄ }

其中，chromesome1前2个基因0.45,0.011分别为交叉概率和变异概率；后25个基因为计算任务和计算资源的映射对，共27个基因。Among them, the first two genes of chromesome1 are 0.45 and 0.011, respectively, the crossover probability and mutation probability; the last 25 genes are the mapping pairs of computing tasks and computing resources, with a total of 27 genes.

同样，chromesome2共有5个子染色体组成，其中子染色体中第1个基因表示所分配的计算资源，后续基因表示1号基因上执行的计算任务。Similarly, chromesome2 consists of five sub-chromosomes, in which the first gene in the sub-chromosome represents the allocated computing resources, and the subsequent genes represent the computing tasks performed on the No. 1 gene.

(2)构造均衡混合初始化种群(2) Constructing a balanced mixed initialization population

根据步骤(1)得到多重染色体对的工作流任务特征集合W共有50个染色体对。例如，选用2种调度算法和随机算法，每种调度各生成15个染色体对，剩余20个染色体对由随机算法生成。According to step (1), the workflow task feature set W for obtaining multiple chromosome pairs has a total of 50 chromosome pairs. For example, two scheduling algorithms and random algorithms are selected, each scheduling generates 15 chromosome pairs, and the remaining 20 chromosome pairs are generated by the random algorithm.

(3)自适应计算相关变量(3) Adaptive calculation of relevant variables

按公式(1)和(2)自适应计算W中所有染色体对的交叉概率和变异概率，得到概率矩阵P，According to formulas (1) and (2), the crossover probability and mutation probability of all chromosome pairs in W are adaptively calculated, and the probability matrix P is obtained,

并将对应的交叉概率和变异概率写入到对应染色体对的相应基因位。And write the corresponding crossover probability and mutation probability to the corresponding locus of the corresponding chromosome pair.

(4)执行再进化(4) Perform re-evolution

按照既定规则将调度寻优策略资源池A中的25条染色体对作为再进化操作的输入，并加入25条随机生成的染色体对，获得再进化种群作为步骤(3)的输入，按照既定规则迭代再进化代数，所得染色体对均存入调度寻优策略资源池B，为步骤(5)提供调度策略备用资源。According to the established rules, the 25 chromosome pairs in the resource pool A of the scheduling optimization strategy are used as the input of the re-evolution operation, and 25 randomly generated chromosome pairs are added to obtain the re-evolved population as the input of step (3), and iterate according to the established rules. The algebra is re-evolved, and the obtained chromosome pairs are all stored in the resource pool B of the scheduling optimization strategy to provide the scheduling strategy spare resources for step (5).

设调度寻优策略资源池A中的存储种群为S_A：Let the storage population in the scheduling optimization strategy resource pool A be S _A :

S_A＝{a₀,a₁,...,a₄₉}；S _A ={a ₀ ,a ₁ ,...,a ₄₉ };

按照既定规则选取的部分种群为S₁：Part of the population selected according to the established rules is S ₁ :

S₁＝{a_i,a_i+1,...,a_i+24}；S ₁ ={a _i ,a _i+1 ,...,a _i+24 };

新生成的部分种群为S₂：S₂＝{a₀,a₁,...,a₂₄}；The newly generated partial population is S ₂ : S ₂ ={a ₀ ,a ₁ ,...,a ₂₄ };

S_new＝{S₁,S₂}S _new ={S ₁ ,S ₂ }

限于篇幅，仅列举其中第0个染色体对的数据，Due to space limitations, only the data of the 0th chromosome pair is listed.

W₀＝{chromesome1,chromesome2}W ₀ = {chromesome1,chromesome2}

chromesome1＝sequence of{0.25,0.002,T₀(2),...,T₂₄(3)}chromesome1=sequence of{0.25,0.002,T ₀ (2),...,T ₂₄ (3)}

chromesome2＝sequence of{vm₀,T₄,T₁,T₃,...,T₁₄,T₁₆,T₂₂}chromesome2=sequence of{vm ₀ ,T ₄ ,T ₁ ,T ₃ ,...,T ₁₄ ,T ₁₆ ,T ₂₂ }

... ...

sequence of{vm₄,T₁₀,T₂₀,T₁₇,T₁₉,T₂₁,T₂₃}sequence of{vm ₄ ,T ₁₀ ,T ₂₀ ,T ₁₇ ,T ₁₉ ,T ₂₁ ,T ₂₃ }

(5)优选调度策略(5) Optimal scheduling strategy

综合评价调度寻优策略池A和调度寻优策略池B中的每条染色体，对比适应度值和执行花费，匹配约束条件，最终判定优选调度策略Final。Comprehensively evaluate each chromosome in the scheduling optimization strategy pool A and the scheduling optimization strategy pool B, compare the fitness value and the execution cost, match the constraints, and finally determine the optimal scheduling strategy Final.

Final＝{chromesome1,chromesome2}Final={chromesome1,chromesome2}

chromesome1＝sequence of{0.07,0.001,T₀(2),...,T₂₄(2)}chromesome1=sequence of{0.07,0.001,T ₀ (2),...,T ₂₄ (2)}

chromesome2＝sequence of{vm₀,T₄,T₁,T₃,...,T₁₄,T₁₇,T₂₁}chromesome2=sequence of{vm ₀ ,T ₄ ,T ₁ ,T ₃ ,...,T ₁₄ ,T ₁₇ ,T ₂₁ }

... ...

sequence of{vm₄,T₁₀,T₂₀,T₁₆,T₁₉,T₂₂}sequence of{vm ₄ ,T ₁₀ ,T ₂₀ ,T ₁₆ ,T ₁₉ ,T ₂₂ }

本实施例与其它两种方法的结果对比如表1所示，如在应用要求截止时间约束为328.7单位时间时本发明提出的约束满足率为100％，执行花费为13.2单位花费；PSO方法的约束满足率为10％，执行花费为41.6单位花费；CGA²方法的约束满足率为60％，执行花费为18.9单位花费。由表可知，在给定截止时间约束下，本发明的调度进化寻优方法比PSO与CGA²调度方法约束满足率更高，执行花费更低。The comparison between the results of this embodiment and the other two methods is shown in Table 1. For example, when the application requirement deadline is constrained to be 328.7 units of time, the constraint satisfaction rate proposed by the present invention is 100%, and the execution cost is 13.2 units of cost; The constraint satisfaction rate is 10%, and the execution cost is 41.6 units; the CGA ² method has a constraint satisfaction rate of 60%, and the execution cost is 18.9 units. It can be seen from the table that under a given deadline constraint, the scheduling evolutionary optimization method of the present invention has a higher constraint satisfaction rate and lower execution cost than the PSO and CGA ² scheduling methods.

表1本实施例方法与其它方法的结果对比Table 1 Results comparison between the method of this embodiment and other methods

上述工作流调度进化寻优方法，针对工作流任务集合的特点，基于遗传算法提出一种再进化(Re-Evolution)策略来解决局部最优和收敛速度慢的问题，并提出一种多种染色体对编码结构，可根据工作流任务节点信息充分描述工作流调度策略，进而通过自适应计算截止时间约束下修正均值的相关变量来保证有效基因的存活率，从两个调度寻优资源池中按照既定规则优选最终调度策略。与现有基于进化算法的调度方法对比，该方法减少了计算量，能保证特定应用场景约束需要的满足率，并保证算法可控的执行效率，可避免局部最优和收敛速度慢等问题。经验证，本发明公布的截止时间约束下的工作流调度进化寻优方法能满足指定应用的实际需求，在基于截止时间约束的工作流调度寻优领域有广泛的参考价值。The above-mentioned workflow scheduling evolutionary optimization method, according to the characteristics of workflow task set, proposes a Re-Evolution strategy based on genetic algorithm to solve the problems of local optimization and slow convergence speed, and proposes a variety of chromosomes For the coding structure, the workflow scheduling strategy can be fully described according to the workflow task node information, and then the survival rate of effective genes can be guaranteed by adaptively calculating the relevant variables of the corrected mean value under the constraint of the deadline. The established rules prefer the final scheduling policy. Compared with the existing scheduling methods based on evolutionary algorithms, this method reduces the amount of computation, can ensure the satisfaction rate required by the constraints of specific application scenarios, and ensure the controllable execution efficiency of the algorithm, and can avoid problems such as local optimization and slow convergence. It has been verified that the workflow scheduling evolutionary optimization method under the deadline constraint disclosed in the present invention can meet the actual needs of the specified application, and has extensive reference value in the field of workflow scheduling optimization based on deadline constraint.

对应于上文实施例所述的工作流调度进化寻优方法，图4示出了本发明实施例提供的工作流调度进化寻优程序的运行环境示意图。为了便于说明，仅示出了与本实施例相关的部分。Corresponding to the workflow scheduling evolutionary optimization method described in the above embodiment, FIG. 4 shows a schematic diagram of the running environment of the workflow scheduling evolutionary optimization program provided by the embodiment of the present invention. For convenience of explanation, only the parts related to this embodiment are shown.

在本实施例中，所述的工作流调度进化寻优程序400安装并运行于终端设备40中。该终端设备40可包括，但不仅限于，存储器401和处理器402。图4仅示出了具有组件401-402的终端设备40，但是应理解的是，并不要求实施所有示出的组件，可以替代的实施更多或者更少的组件。In this embodiment, the workflow scheduling evolution optimization program 400 is installed and run in the terminal device 40 . The terminal device 40 may include, but is not limited to, a memory 401 and a processor 402 . Figure 4 shows only the terminal device 40 having components 401-402, but it should be understood that implementation of all of the illustrated components is not required, and more or fewer components may be implemented instead.

所述存储器401在一些实施例中可以是所述终端设备40的内部存储单元，例如该终端设备40的硬盘或内存。所述存储器401在另一些实施例中也可以是所述终端设备40的外部存储设备，例如所述终端设备40上配备的插接式硬盘，智能存储卡(Smart MediaCard，SMC)，安全数字(Secure Digital，SD)卡，闪存卡(Flash Card)等。进一步地，所述存储器401还可以既包括所述终端设备40的内部存储单元也包括外部存储设备。所述存储器401用于存储安装于所述终端设备40的应用软件及各类数据，例如所述工作流调度进化寻优程序400的程序代码等。所述存储器401还可以用于暂时地存储已经输出或者将要输出的数据。In some embodiments, the memory 401 may be an internal storage unit of the terminal device 40 , such as a hard disk or a memory of the terminal device 40 . In other embodiments, the memory 401 may also be an external storage device of the terminal device 40, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital Secure Digital, SD) card, flash memory card (Flash Card), etc. Further, the memory 401 may also include both an internal storage unit of the terminal device 40 and an external storage device. The memory 401 is used to store application software and various types of data installed in the terminal device 40 , such as program codes of the workflow scheduling evolution optimization program 400 , and the like. The memory 401 can also be used to temporarily store data that has been output or will be output.

所述处理器402在一些实施例中可以是一中央处理器(Central ProcessingUnit，CPU)，微处理器或其他数据处理芯片，用于运行所述存储器401中存储的程序代码或处理数据，例如执行所述工作流调度进化寻优程序400等。In some embodiments, the processor 402 may be a central processing unit (Central Processing Unit, CPU), a microprocessor or other data processing chips, and is used to execute the program code stored in the memory 401 or process data, such as executing The workflow scheduling evolution optimization program 400 and the like.

该终端设备40还可包括显示器，所述显示器在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode，OLED)触摸器等。The terminal device 40 may further include a display, which in some embodiments may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch panel, and the like.

请参阅图5，是本发明实施例提供的工作流调度进化寻优程序400的程序模块图。在本实施例中，所述的工作流调度进化寻优程序400可以被分割成一个或多个模块，所述一个或者多个模块被存储于所述存储器401中，并由一个或多个处理器(本实施例为所述处理器402)所执行，以完成本发明。例如，在图5中，所述的工作流调度进化寻优程序400可以被分割成信息编码结构设计模块501、均衡混合初始化种群形成模块502、调度寻优策略模块503和判定模块504。本发明所称的模块是指能够完成特定功能的一系列计算机程序指令段，比程序更适合于描述所述工作流调度进化寻优程序400在所述服务器40中的执行过程。以下描述将具体介绍所述模块501-504的功能。Please refer to FIG. 5 , which is a program module diagram of a workflow scheduling evolutionary optimization program 400 provided by an embodiment of the present invention. In this embodiment, the workflow scheduling evolution optimization program 400 may be divided into one or more modules, and the one or more modules are stored in the memory 401 and processed by one or more modules The processor (in this embodiment, the processor 402) is executed to complete the present invention. For example, in FIG. 5 , the workflow scheduling evolution optimization program 400 can be divided into an information coding structure design module 501 , a balanced mixed initialization population formation module 502 , a scheduling optimization strategy module 503 and a decision module 504 . The module referred to in the present invention refers to a series of computer program instruction segments capable of accomplishing specific functions, and is more suitable for describing the execution process of the workflow scheduling evolutionary optimization program 400 in the server 40 than a program. The following description will specifically introduce the functions of the modules 501-504.

其中，信息编码结构设计模块501，用于根据用户输入参数和工作流数据，选取遗传算法种群中染色体作为调度策略信息载体，结合种群规模M、计算资源规模N以及工作流数据设计多重染色体对W的信息编码结构；所述用户输入参数包括截止时间约束参数。Among them, the information coding structure design module 501 is used to select the chromosomes in the genetic algorithm population as the scheduling strategy information carrier according to the user input parameters and workflow data, and combine the population size M, computing resource size N and workflow data to design multiple chromosome pairs W information coding structure; the user input parameters include deadline constraint parameters.

均衡混合初始化种群形成模块502，用于根据所述信息编码结构构造的任一多重染色体对，按照构造方法循环单个多种染色体对构造过程，形成规模为M的均衡混合初始化种群。The balanced mixed initialization population forming module 502 is used for any multiple chromosome pair constructed according to the information coding structure to cycle through the construction process of a single multiple chromosome pair according to the construction method to form a balanced mixed initialization population with a scale of M.

调度寻优策略模块503，用于根据所述均衡混合初始化种群，构建进化过程，并依据所述用户输入参数调整进化方向，按照所述用户输入参数以及原始适应度修正均值构造改进适应度，并根据所述适应度计算进化过程中种群每条染色体的自适应交叉概率Pc和自适应变异概率Pm，将每条染色体进行基因操作并将所得调度策略存入调度寻优策略池A；以及The scheduling optimization strategy module 503 is used for initializing the population according to the balanced mixture, constructing the evolution process, adjusting the evolution direction according to the user input parameters, constructing the improved fitness according to the user input parameters and the original fitness correction mean, and Calculate the adaptive crossover probability Pc and the adaptive mutation probability Pm of each chromosome of the population in the evolution process according to the fitness, perform genetic manipulation on each chromosome and store the obtained scheduling strategy in the scheduling optimization strategy pool A; and

将优选策略池A中的ω条染色体作为截止时间约束下再进化操作的输入，并加入M-ω条随机生成的染色体，获得再进化种群作为步骤S3的输入并按照既定规则迭代执行再进化，所得调度策略均存入调度寻优策略池B；The ω chromosomes in the optimal strategy pool A are used as the input of the re-evolution operation under the deadline constraint, and M-ω randomly generated chromosomes are added to obtain the re-evolved population as the input of step S3, and the re-evolution is performed iteratively according to the established rules, The obtained scheduling strategies are stored in the scheduling optimization strategy pool B;

判定模块504，用于综合评价调度寻优策略池A和调度寻优策略池B的每条调度策略，匹配截止时间约束条件，并对比截止时间约束满足率和执行花费，评价算法执行效率，最终判定优选调度策略。The determination module 504 is used to comprehensively evaluate each scheduling strategy of the scheduling optimization strategy pool A and the scheduling optimization strategy pool B, match the deadline constraints, and compare the deadline constraint satisfaction rate and execution cost, evaluate the algorithm execution efficiency, and finally. Determine the preferred scheduling strategy.

作为一种可实施方式，多重染色体对W的信息编码结构共包含2条染色体，即As an embodiment, the information coding structure of the multiple chromosome pair W contains 2 chromosomes in total, namely

作为另一种可实施方式，均衡混合初始化种群形成模块502具体用于：As another possible implementation manner, the balanced mixed initialization population forming module 502 is specifically configured to:

可选的，调度寻优策略模块503按照所述用户输入参数以及原始适应度修正均值构造改进适应度，并根据所述适应度计算进化过程中种群每条染色体的自适应交叉概率Pc和自适应变异概率Pm，具体过程如下：Optionally, the scheduling optimization strategy module 503 constructs the improved fitness according to the user input parameters and the original fitness correction mean, and calculates the adaptive crossover probability Pc and adaptive crossover probability of each chromosome of the population in the evolution process according to the fitness. Mutation probability Pm, the specific process is as follows:

进一步的，判定模块504具体用于：Further, the determination module 504 is specifically used for:

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，仅以上述各功能单元、模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能单元、模块完成，即将所述装置的内部结构划分成不同的功能单元或模块，以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中，上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。另外，各功能单元、模块的具体名称也只是为了便于相互区分，并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated to different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated in one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit, and the above-mentioned integrated units may adopt hardware. It can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working processes of the units and modules in the above-mentioned system, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of the present invention.

在本发明所提供的实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的系统实施例仅仅是示意性的，例如，所述模块或单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口，装置或单元的间接耦合或通讯连接，可以是电性，机械或其它的形式。In the embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the system embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention are essentially or contribute to the prior art, or all or part of the technical solutions may be embodied in the form of software products, and the computer software products are stored in a storage The medium includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

以上所述实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围，均应包含在本发明的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it is still possible to implement the foregoing implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included in the within the protection scope of the present invention.

Claims

1. a workflow scheduling evolutionary optimization method, is characterized in that, comprises:

Step S1, according to the user input parameters and workflow data, select chromosomes in the genetic algorithm population as the scheduling strategy information carrier, and design the information encoding structure of the multiple chromosome pair W in combination with the population size M, the computing resource size N and the workflow data; the user Input parameters include deadline constraint parameters;

Step S2, according to any multiple chromosome pair constructed by the information coding structure, cycle the construction process of a single multiple chromosome pair according to the construction method to form a balanced mixed initialization population with a scale of M;

Step S3, initialize the population according to the balanced mixture, construct an evolution process, adjust the evolution direction according to the user input parameters, construct an improved fitness according to the user input parameters and the original fitness correction mean, and calculate according to the fitness During the evolution process, the adaptive crossover probability Pc and the adaptive mutation probability Pm of each chromosome of the population are genetically manipulated for each chromosome and the resulting scheduling strategy is stored in the scheduling optimization strategy pool A;

Wherein, the genetic manipulation refers to the operations of crossover, mutation and selection in the evolutionary process;

Step S4, take the ω chromosomes in the optimal strategy pool A as the input of the re-evolution operation under the deadline constraint, and add M-ω randomly generated chromosomes to obtain the re-evolved population as the input population of step S3 and iterate according to step S3. Perform re-evolution, and the obtained scheduling strategies are stored in the scheduling optimization strategy pool B;

The conditions for stopping the iteration are: the number of iterations reaches a preset number of times;

Step S5, match the deadline constraints, and compare the deadline constraint satisfaction rate and execution cost, evaluate the algorithm execution efficiency, and determine each scheduling strategy in the scheduling optimization strategy pool A and the scheduling optimization strategy pool B to determine the preferred scheduling strategy. ;

Wherein, the information coding structure of the multiple chromosome pair W in step S1 includes a total of 2 chromosomes, namely

Multiple chromosome pair W={chromesome1,chromesome2}

chromesome1=sequence of{Pc,Pm,gen ₀ ,gen ₁ ,…,gen _k ,…,gen _L-1 }

chromesome2={tof ₀ ,tof ₁ ,…,tof _i ,…,tof _N-1 }

Assuming that on the ith virtual machine, the size of the execution sequence is g, then tof _i can be expressed as:

Among them, Pc is the adaptive crossover probability of the chromosome; Pm is the adaptive mutation probability of the chromosome; gen _k is the gene at the k position of the chromosome, 0≤k<L; vm _i is the ith virtual machine in the computing resources, 0 ≤i<N; tof _i is the execution sequence of computing tasks allocated to computing resource vm _i ;

Among them, L is an integer greater than 0, N is the number of virtual machines,

is the g-1th computing task of computing resource vm _i ;

Wherein, according to any multiple chromosome pair constructed by the information coding structure, the construction process of a single multiple chromosome pair is cycled according to the construction method to form a balanced mixed initialization population with a scale of M. The specific process is as follows:

The improved fitness is constructed according to the user input parameters and the original fitness correction mean, and the adaptive crossover probability Pc and the adaptive mutation probability Pm of each chromosome of the population in the evolution process are calculated according to the fitness. The specific process is as follows:

For the balanced mixed initialization population, the original fitness vector F _ini of the chromosome is calculated according to the deadline constraint, the mean value is corrected according to the original fitness, and the improved fitness vector F _imp is constructed;

Use the maximum improved fitness value to subtract the current improved fitness value, and divide by the difference between the maximum improved fitness value and the modified average value of the improved fitness to obtain the probability intermediate variable V _mi ;

When the improved fitness value is less than its revised mean, the probability correction variable takes the value of 1;

Multiply the relationship between the improved fitness value and its revised mean value with different parameters and mark it as Pc;

Take M target chromosome samples, and repeat the above process to obtain M Pc values;

Based on the same process, Pm can be obtained;

Wherein, the calculation formula for calculating the adaptive crossover probability Pc and the adaptive mutation probability Pm of each chromosome of the population in the evolution process according to the fitness includes:

Among them, P _c (i) represents the crossover probability of the ith chromosome pair in the contemporary population, P _m (i) represents the mutation probability of the ith chromosome pair in the contemporary population, f _max represents the maximum improved fitness value, f(i ) represents the current improved fitness of the i-th chromosome, f _cavg represents the revised mean of the fitness of the current population, k ₁ and k ₃ represent crossover parameters, and k ₂ and k ₄ represent mutation parameters;

Wherein, in step S5, it is determined that each scheduling strategy of the scheduling optimization strategy pool A and the scheduling optimization strategy pool B is determined, and the process of determining the preferred scheduling strategy is as follows:

Perform steps S2, S3 and S4 on all chromosomes in the population to obtain the scheduling optimization strategy pool A and the scheduling optimization strategy pool B, and use the deadline constraint satisfaction rate as the evaluation criterion to compare the scheduling optimization obtained in steps S4 and S5. Strategy pool A and scheduling optimization strategy pool B;

determining secondary constraints according to the user input information; the user input information further includes secondary constraint information;

If the fitness of a scheduling optimization strategy is optimal and does not violate the secondary deadline constraint, or the degree of violation of the deadline constraint is the smallest relative to other scheduling optimization strategies, the scheduling optimization strategy is determined as the target scheduling optimization strategy, and given output the scheduling result.

2. A terminal device, comprising a memory and a processor, wherein a computer program that can be run on the processor is stored in the memory, and the processor implements the following steps when executing the computer program:

Multiple chromosome pair W={chromesome1,chromesome2}

chromesome1=sequence of{Pc,Pm,gen ₀ ,gen ₁ ,…,gen _k ,…,gen _L-1 }

chromesome2={tof ₀ ,tof ₁ ,…,tof _i ,…,tof _N-1 }

Among them, Pc is the adaptive crossover probability of the chromosome; Pm is the adaptive mutation probability of the chromosome; gen _k is the gene at the k position of the chromosome, 0≤k<L; vm _i is the ith virtual machine in the computing resources, 0 ≤i<N; tofi is the execution sequence of computing tasks allocated to the computing resource vmi;

is the g-1th computing task of computing resource vm _i ;

Based on the same process, Pm can be obtained;

3. A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the steps of the method according to claim 1 are implemented.