CN106201701A

CN106201701A - A kind of workflow schedule algorithm of band task duplication

Info

Publication number: CN106201701A
Application number: CN201610569560.8A
Authority: CN
Inventors: 李云; 阮敏; 袁运浩
Original assignee: Yangzhou University
Current assignee: Yangzhou University
Priority date: 2016-07-14
Filing date: 2016-07-14
Publication date: 2016-12-07

Abstract

The present invention relates to the workflow schedule algorithm of a kind of band task duplication.The task initialization DAG figure that the present invention submits to according to user, builds task queue V, V={v according to the priority of DAG figure₁, v₂, Λ, v_n, calculate first task v of task queue V_iThe deadline on each processor in virtual list, compare lookup task v_iMinimum completion time T_ft(v_i, P_k), building the task that task queue A storage needs to repeat, task queue B stores having repeated of task, and task stores according to earliest start time non-increasing, use genetic algorithm iteration to go out optimal solution, having mapped of task is assigned to corresponding resource to obtain optimal scheduling scheme.Instant invention overcomes DAG figure and not there is the defect of significantly layering and priority restrictions clearly.Due to the fact that in D IAHA and IAHA, task whether to select in intersection and variation stage take into account the error probability of task during the resource made a variation, so greatly reducing the number of times of makeing mistakes when task performs on a processor.

Description

A kind of workflow schedule algorithm of band task duplication

Technical field

The invention belongs to the task scheduling field in cloud computing, the workflow height being specifically related to a kind of band task duplication is calculated Method.

Background technology

Heterogeneous distributed calculating system is made up of the most millions of Heterogeneous Computing nodes；These nodes are by arbitrary Network architecture interconnects, and the calculating resource handled up for height and the establishment of virtual organization meet the field of being widely applied Demand.

There is various ways in heterogeneous distributing system, cloud computing is one of them example.Along with science and technology development with enter Step, proposes the general of cloud computing on the basis of conventional various calculating such as parallel computation, Distributed Calculation, grid computing development Read.Academia and industrial circle are all trying to explore cloud computing, and the epoch of cloud computing are being stepped in the whole world.Cloud computing is with business model Form represent, it by cloud platform to user represent effectiveness calculate rosy prospect and some can attracting characteristic, than As using shared resource etc. as the service of formation on-demand on the Internet, according to access times and the charge of the demand of use thus be user Service instant, flexible, extendible is provided.

Cloud computing is made up of the software service of many Heterogeneous Computing nodes, Virtualization Computer and dynamically configuration.These Software service is come by the demand of their availability, performance, function and service quality (Quality of Service, QoS) The application program of competition user terminal.The available service of cloud computing is generally divided into three classes: infrastructure i.e. services (Infrastructure as a Service, IaaS), platform i.e. services (Platform as a Service, PaaS), soft Part i.e. services (Software as a Service, SaaS).These services provide difference for the demand meeting different consumer group Service.Although these service some similar functions of existence (such as computing function, store function and network function etc.), but due to Their non-functional characteristic is different and is different from each other, and these functional characteristics are we term it qos parameter, such as: service time, Costs of services, the effectiveness of service, the energy loss of service, service availability etc..

Consider the characteristic of cloud computing, in an efficient way task requests is mapped to the main process that resource is IaaS layer. In this layer, the heterogeneous resource of physics as scheduling unit thus is distributed by it by virtual machine (Virtual Machines, Vms) To task.Each virtual machine is to have to calculate and an abstraction unit of storage capacity in cloud environment.Because resource is different Structure and dynamic, and have the task of different characteristic and quantity, this Mission Scheduling is considered as a np complete problem.

Because a good dispatching method can be greatly improved the performance of cloud system, and does not has definite method at multinomial Optimal solution is found, so scheduling strategy all must rely on heuritic approach to find the good side of solution problem as far as possible in time Method.And in order to task is assigned on the processor in cloud computing environment perform effectively, Workflow Management System requires in detail Most scheduling strategy meets the priority restrictions relation between mission requirements and task.In sum, workflow schedule is at cloud meter It calculation is an important challenge.

Before the present invention makes, in cloud computing, these workflow schedules are difficulties.The difficult point of these problems The factor of many need to be considered: 1 when being scheduler task) all kinds of QoS standard 2) there are the elastic cloud clothes of isomery and dynamic characteristic Business 3) the various combination method that services to be to meet tasks carrying 4) transmission etc. of large area data.

The software run under cloud environment is we term it application program.Each application program is exactly a workflow, its It is made up of the task of being mutually related.These tasks can be scheduled in the different processor under cloud environment running.Exist now and permitted Many scientific applications such as bioinformatics, chemistry and astronomy etc., these application contain deposits between substantial amounts of task and task In complicated priority restrictions relation.These application can be converted to DAG figure (directed acyclic graph), and the DAG figure of conversion is complicated , not there is significantly layering and clearly priority restrictions, our this kind of DAG figure is called complicated DAG figure.

Summary of the invention

The purpose of the present invention is that and overcomes drawbacks described above, develops the workflow schedule algorithm of a kind of band task duplication.

The technical scheme is that

The workflow schedule algorithm of a kind of band task duplication, it is mainly characterized by and comprises the steps:

(1) the task initialization DAG figure submitted to according to user, represents with G=(V, E, [W], C), wherein, and v_i∈ V represents I-th node in DAG figure, E is the set on DAG figure interior joint limit, e_ij∈ E represents task v_iTo task v_jLimit, need simultaneously It is to be noted that task v_iMust be in task v_jPerform before Zhi Hanging；w_ik∈ [W], [W] represents that n × m rank task is in different disposal Execution time matrix on device, w_ikExpression task v_iAt processor P_jOn the execution time；c_ij∈ C represents v_iTo v_jCommunication open Pin, represents with the weights on limit；

(2) task queue V, V={v are built according to the priority of DAG figure₁, v₂, Λ, v_n}；

(3) first task v of task queue V is calculated_iThe deadline on each processor in virtual list:

(3a) task v is calculated_iAt processor P_kOn the initial deadline:

T_ft(v_i, P_k)=T_st(v_i, P_k)+w_ik

Wherein, T_st(v_i, P_k) it is task v_iAt processor P_kOn time started, T_ft(v_i, P_k) it is the deadline；

(3b) M is made_jFor task v_iFather's task node that in all tasks, out-degree is maximum (if there is several above identical out-degree, Then compare the deadline, M_jFor the task that the deadline is maximum).M_iFor task v_iMost important direct father's task, consider weight After multiple task, calculate task v_iAt processor P_kOn the time that is finally completed:

If (i) M_iAnd M_jThe most not or the most scheduled to current processor P_kOn, then return T_ft(v_i, P_k) value；

(ii) if processor P_kUpper existence properly performs M_iOr M_jFree time gap, calculate task M_jAt processor P_kOn Deadline T_ft(v_j, P_k), calculate task M_iAt processor P_kOn deadline T_ft(v_t, P_k):

If 1. T_ft(v_j, P_k) ＜ T_st(v_i, P_k), then at processor P_kUpper iterative task M_j；

If 2. T_ft(v_t, P_k) ＜ T_st(v_i, P_k), then at processor P_kUpper iterative task M_i；

3. other situations, then return T_ft(v_i, P_k)；

(iii) other situations, then return T_ft(v_i, P_k)；

(4) lookup task v is compared_iMinimum completion time T_ft(v_i, P_k), if there is identical value, then this task scheduling is arrived On the relatively small number of processor of task；

(5) when task queue is empty, building the task that task queue A storage needs to repeat, task queue B has stored Repeating of task, task stores according to earliest start time non-increasing；

(6) to all tasks in task queue B, if removing this task to total deadline without impact, then remove This task also updates queue B；

(7) genetic algorithm iteration is used to go out optimal solution；

(8) having mapped of task is assigned to corresponding resource to obtain optimal scheduling scheme.

Advantages of the present invention and effect are in D-IAHA and IAHA, and task is being intersected and whether wanting in the variation stage Select to take into account during the resource of variation the error probability of task, so greatly reducing makeing mistakes when task performs on a processor Number of times.

Accompanying drawing explanation

Fig. 1 present invention realizes schematic flow sheet.

The NSL performance evaluation schematic diagram of Fig. 2 task of the present invention.

The Efficiency performance evaluation schematic diagram of Fig. 3 task of the present invention.

Fig. 4 load factor of the present invention analyzes schematic diagram.

Fig. 5 mission failure of the present invention number of times analyzes schematic diagram.

Detailed description of the invention

One, step describes

The present invention is the Workflow Task Scheduling Algorithm of band task duplication based on cloud computing.With reference to the i.e. present invention's of Fig. 1 Realize schematic flow sheet, then the specific implementation process of the present invention comprises the following steps:

The task initialization DAG figure that step 1. is submitted to according to user, represents with G=(V, E, [W], C).Wherein, v_i∈ V table Showing the i-th node in DAG figure, E is the set on DAG figure interior joint limit, e_ij∈ E represents task v_iTo task v_jLimit, simultaneously Should be noted task v_iMust be in task v_jPerform before Zhi Hanging；w_ik∈ [W], [W] represents that n × m rank task is not existing together Execution time matrix on reason device, w_ikExpression task v_iAt processor P_jOn the execution time；c_ij∈ C represents v_iTo v_jCommunication Expense, represents with the weights on limit.

Step 2. builds task queue V, V={v according to the priority of DAG figure₁, v₂, Λ, v_n}。

Step 3. calculates first task v of task queue V_iThe deadline on each processor in virtual list:

(3.1) task v is calculated_iAt processor P_kOn the initial deadline:

T_ft(v_i, P_k)=T_st(v_i, P_k)+w_ik

Wherein, T_st(v_i, P_k) it is task v_iAt each processor P_kOn time started, T_ft(v_i, P_k) it is the deadline；

(3.2) M is made_jFor task v_iFather's task node that in all tasks, out-degree is maximum (if exist several above the most identical go out Degree, then compare the deadline, M_jFor the task that the deadline is maximum).M_iFor task v_iMost important direct father's task, examining After considering iterative task, calculate task v_iAt processor P_kOn the time that is finally completed:

(3.21) if M_iAnd M_jThe most not or the most scheduled to current processor P_kOn, then return T_ft(v_i, P_k) value；

(3.22) if processor P_kUpper existence properly performs M_iOr M_jFree time gap, calculate task M_jAt processor P_kOn Deadline T_ft(v_j, P_k), calculate task M_iAt processor P_kOn deadline T_ft(v_t, P_k):

3. other situations, then return T_ft(v_i, P_k)；

(3.23) other situations, then return T_ft(v_i, P_k)。

Step 4. compares lookup task v_iMinimum completion time T_ft(v_i, P_k), if there is identical value, then this task is adjusted Spend on the relatively small number of processor of task.

If step 5. does not exist identical value and task queue for sky, build the task that task queue A storage needs to repeat, Task queue B stores having repeated of task, and task stores according to earliest start time non-increasing.

Step 6., to all tasks in task queue B, if removing this task to total deadline without impact, is then deleted Except Redundant task, update task queue.

Step 7. uses genetic algorithm iteration to go out optimal solution.

Having mapped of task is assigned to corresponding resource to obtain optimal scheduling scheme by step 8..

Two, emulation experiment

1. experiment parameter is arranged

In order to the effectiveness of D-IAHA algorithm is tested, utilize herein CloudSim platform by its with HEFT algorithm and IAHA algorithm compares.Compare with HEFT to be because in HEFT also having used and insert task in processor free time gap Strategy, comparing with IAHA and being because D-IAHA is to consider communication overhead element then to carry out on the basis of IAHA Certain improvement.The CCR parameter that this algorithm is used represents communication computing ratio, and CCR value changes in an experiment, and it calculates Formula is as follows:

C C R = \frac{C m}{C p}

Wherein, Cm is average communication expense；Cp is average computation expense.

Experiment parameter is as shown in table 1:

Table 1 simulation parameter

The performance metric parameter that this experiment uses is NSL and Efficiency, and its computing formula is as follows:

N S L = \frac{m a k e s p a n}{M S C}

Wherein, makespan is the deadline of overall algorithm；MSC is the max calculation expense summation in critical path.

E f f i c i e n c y = \frac{S L U}{S L M \times N u m} \times 100

Wherein, SLU is the scheduling length in single processor system；SLM is scheduling length in a multi-processor system； Num is processor number.

2. simulation result

Test the NSL performance evaluation of 1 task

Fig. 2 is HEFT, IAHA, D-IAHA algorithm situation of change with the increase NSL value of CCR value.Increasing along with CCR value Greatly, i.e. task gradually changes to communications-intensive, and call duration time has increased, and total deadline of HEFT and IAHA is subject to Impact increases, then NSL value is consequently increased；And D-IAHA is owing to taking the method calculation cost of iterative task to exchange communication generation for Valency, thus job start time is shifted to an earlier date, decrease call duration time and there will not be significantly increase total time, so NSL Value is less than other two algorithm.If but CCR becomes increasing, task communication is frequent, causes relation between task to become complicated, then How to select iterative task to have any problem, total deadline can rebound.

Test the Efficiency performance evaluation of 2 tasks

Fig. 3 is HEFT, IAHA, D-IAHA algorithm situation of change with the increase Efficiency value of CCR value.D-IAHA's Efficiency value is all less than other two algorithm.Processor number owing to setting in this algorithm is certain, So can illustrate that its scheduling length is smaller than other two algorithm, so its performance improves more.

Test 3 load factor analyses

Fig. 4 is HEFT, IAHA, D-IAHA algorithm situation of change increasing its load factor with task number.Due to IAHA And D-IAHA uses task is assigned to calculate the method on speed, performance preferably processor, and intersecting and In view of this concept of resource load during variation, it is possible to find out that the load factor of both algorithms is less than HEFT；But Change the mode of communication cost with calculation cost reduce total deadline owing to D-IAHA takes, so the load on each node Rate is more relatively high than IAHA, but is also reduction of the load factor of resource compared with other algorithms.

Test 4 mission failure number of times analyses

Fig. 5 is the frequency of failure situation of change that under HEFT, IAHA, D-IAHA algorithm, task performs on a processor.Because Task performs to there is certain error rate on a processor, so there being certain fault tolerant mechanism in cloud computing system.If but energy The number of times of makeing mistakes fundamentally reducing task just can be greatly improved the performance of system.The number of times of makeing mistakes of D-IAHA and IAHA is less than HEFT, mainly due to the error probability not taking into account task in HEFT；And in D-IAHA and IAHA, task intersect and Whether to select in the variation stage take into account the error probability of task during the resource made a variation, so greatly reducing task at place Number of times of makeing mistakes when performing on reason device.

Claims

1. the workflow schedule algorithm of a band task duplication, it is characterised in that comprise the steps:

(1) the task initialization DAG figure submitted to according to user, represents with G=(V, E, [W], C), wherein v_i∈ V, represents DAG figure In i-th node, E is the set on DAG figure interior joint limit, e_ij∈ E represents task v_iTo task v_jLimit, note simultaneously, appoint Business v_iMust be in task v_jPerform before Zhi Hanging；w_ik∈ [W], [W] represents the execution on different processor of n × m rank task Time matrix, w_ikExpression task v_iAt processor P_jOn the execution time；c_ij∈ C represents v_iTo v_jCommunication overhead, with on limit Weights represent；

(3a) task v is calculated_iAt processor P_kOn the initial deadline:

T_ft(v_i, P_k)=T_st(v_i, P_k)+w_ik

(3b) M is made_jFor task v_iFather's task node that in all tasks, out-degree is maximum (if there is several above identical out-degree, then than The relatively deadline, M_jFor the task that the deadline is maximum)；M_iFor task v_iMost important direct father's task, consider repeat appoint After business, calculate task v_iAt processor P_kOn the time that is finally completed:

(ii) if processor P_kUpper existence properly performs M_iOr M_jFree time gap, calculate task M_jAt processor P_kOn complete Time T_ft(v_j, P_k), calculate task M_iAt processor P_kOn deadline T_ft(v_t, P_k):

3. other situations, then return T_ft(v_i, P_k)；

(iii) other situations, then return T_ft(v_i, P_k)；

(4) lookup task v is compared_iMinimum completion time T_ft(v_i, P_k), if there is identical value, then by this task scheduling to task On relatively small number of processor；

(5) if there is not identical value and task queue for sky, the task that task queue A storage needs to repeat, task team are built Row B stores having repeated of task, and task stores according to earliest start time non-increasing；

(6) to all tasks in task queue B, if removing this task to total deadline without impact, then redundancy is deleted Task, updates task queue；

(7) genetic algorithm iteration is used to go out optimal solution；