CN111861412A

CN111861412A - Completion time optimization-oriented scientific workflow scheduling method and system

Info

Publication number: CN111861412A
Application number: CN202010732161.5A
Authority: CN
Inventors: 钱诗友; 周杰; 薛广涛
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2020-07-27
Filing date: 2020-07-27
Publication date: 2020-10-30
Anticipated expiration: 2040-07-27
Also published as: CN111861412B

Abstract

The invention provides a completion time optimization-oriented scientific workflow scheduling method and a completion time optimization-oriented scientific workflow scheduling system, which comprise the following steps: converting the scientific workflow task into a server-free function and deploying the server-free function into a corresponding cluster; converting a given scientific workflow into a corresponding directed acyclic graph; for each layer of tasks in the directed acyclic graph, allocating resources to the tasks and running the tasks according to parameter configuration; and in the process of task operation, the cluster is kept monitored, and the resource allocation of each task is dynamically adjusted. Compared with the prior art, the invention realizes higher integral completion time and clustering performance by fully utilizing the elastic telescopic capacity provided by the server-free framework.

Description

Completion time optimization-oriented scientific workflow scheduling method and system

Technical Field

The invention relates to the technical field of computers, in particular to a scientific workflow scheduling method and a scientific workflow scheduling system for completion time optimization based on a serverless architecture.

Background

In scientific computing, scientific workflows are a widely used abstraction.

A workflow is typically represented by a Directed Acyclic Graph (DAG) that includes a series of tasks and data input or output within or between the tasks. Constructing a scientific application into a workflow provides a convenient abstraction to represent scientific problems, sinking complex parallel scheduling into workflow scheduling systems.

For scientific workflows, the repeatability of their work is crucial. Multiple dependencies or multiple profiles may occur for a scientific workflow task, and because of these complex dependencies and profiles, there may be dependency conflicts between different workflow tasks. How to deal with these dependency problems is one of the signs of a mature scientific workflow system.

To this end, some existing scientific workflow systems choose to package tasks into containers and then run in container-based clusters. This solves the problem of workflow task dependency, but how to efficiently schedule containers for executing workflow tasks in a cluster remains a challenging problem.

In addition, workflow tasks are of different types, and are generally categorized on a time basis into two categories: a long task or a short task. A large number of known workflow systems may place workflow tasks into preset virtual machines, containers, or customized workflow runs of the virtual machines, the containers, or the customized workflow systems to run, but resource adjustment cannot be performed on the workflow tasks in the process of running of the workflow tasks, so that when a plurality of workflow tasks run simultaneously, imbalance of allocated resources among the workflow tasks may be caused, and the scheduling length of one or more workflows is extended. For scientific workflows with heterogeneous tasks, how to dynamically optimize resource allocation of long tasks or short tasks, the existing system lacks an efficient solution.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a scientific workflow scheduling method and system for completion time optimization.

The scientific workflow scheduling method for completion time optimization provided by the invention comprises the following steps:

a DAG conversion step: converting a given scientific workflow configuration into a corresponding directed acyclic graph;

function conversion step: converting each layer of tasks in the scientific workflow into a server-free function and deploying the server-free function into a corresponding cluster;

resource allocation step: for each layer of tasks in the directed acyclic graph, allocating resources to the tasks and running the tasks according to parameter configuration;

and a dynamic adjustment step: and in the process of task operation, the cluster is kept monitored, and the resource allocation of each task is dynamically adjusted according to the residual operation time of the task and the available resources of the system.

Preferably, the DAG converting step comprises:

s101, extracting each task in workflow configuration, creating a graph node in a corresponding directed acyclic graph, and designating the maximum running instance number of each task;

s102, adding corresponding edges in the created directed acyclic graph according to the dependency relationship among the workflow tasks;

s103, generating an ID corresponding to the directed acyclic graph based on the name of the workflow;

s104, marking a root node in the created directed acyclic graph;

and S105, packaging the elements including the ID, the deadline and the directed acyclic graph into a workflow object.

Preferably, in the function conversion step, the tasks with the same function but different inputs in one scientific workflow are gathered, the tasks are exposed through Restful API, and the same tasks are packaged into the same serverless function by using the same task images.

Preferably, the function conversion step includes:

s201, extracting a task needing conversion at present according to the index;

s202, extracting the name and the mirror image address of the task needing to be converted currently;

s203, packing information including the names, mirror addresses, name spaces and automatic expansion rules of the tasks to generate a corresponding server-free function;

s204, the workflow system distributes a pair of input keys and output keys to each server-free function instance according to the input and output specified by the task;

s205, writing a set K _ input and a set K _ output of an input file and an output file into an input key and an output key corresponding to each serverless function instance by the workflow system;

s206, when no service function is executed, the workflow system downloads the function mirror image, deploys the corresponding server-free function in the cluster according to the mirror image, generates a function running example, and reserves the Restful API interface exposed by the example.

Preferably, the resource allocation step includes determining an amount of resources required for a unit task:

the workflow system distributes the resources of the whole cluster to all running tasks in proportion according to the total function instance number of all the tasks running in the workflow pool, the residual running time of the workflow and the available resources of the current cluster;

the following method is adopted to determine the amount of CPU resources and memory resources required by a single instance of each task:

for each task to be run, starting 1 instance of the task in a container, collecting data of CPU resources and memory resources consumed by the container at regular time intervals, and forming two time sequences by the collection results, which are respectively marked as d_cpu＝{d_cpu，1，…，d_cpu，nAnd d_mem＝{d_mem，1，…，d_mem，n}；

The sequence number of the current time interval is recorded as i, and the collected CPU data is recorded as d_cpu，iThe memory data is d_mem，i(ii) a If in a certain time interval i, the following two conditions are simultaneously satisfied:

then the resource consumption of the current task tends to be stable, and d_cpu，iAnd d_mem，iAs used in inter-task resource allocation algorithmsThe consumption of task resources;

if at time interval j, j > i, the collected CPU data d_cpu，jAnd memory data d_mem，jIf the above two conditions are satisfied, then d is selected for CPU resource data_cpu，iAnd d_cpu，jThe maximum value in (1) is used as the amount of CPU resources for scheduling; processing the memory resource data in the same way;

if no time interval i satisfies the above two conditions, then d is taken_cpuAnd d_memAs the amount of resources required for a single instance of the task.

Preferably, the total resource allocation method for a single task comprises the following steps:

determining the k-th task tau which is running by the current workflow according to the resources required by the single instance of the task_i，kAmount of CPU resources d of a single container_cpu，i，kAnd amount of memory resources d_mem，i，k；

Counting the total available resource CPU of the CPU and the memory of all the nodes of the whole cluster_TotAnd mem_Tot；

For each scientific workflow with tasks running, the first task starts to execute at time t_iThe current time is t_curThe current workflow has an expiration date D_iTime factor Δ t of the workflow_iComprises the following steps:

the kth task T of the ith workflow in operation_i，kThe new resource number of (2) is calculated according to the following formula:

wherein n represents the number of scientific workflows that have tasks running,

is the number of instances at the current level of task k that the ith scientific workflow is running,

is the number of instances at the current level of task k that the jth scientific workflow is running.

Preferably, the method for determining the number of function instances of a single task comprises:

number of instances F set according to the task by the data of the incoming request_iPutting a request into the request cache pool;

according to the resource quantity required by a single task instance and the number of resources held by the current task, the monitoring thread calculates the new number of function instances:

number a of function instances currently running_iAnd a new function instance number a'_iAnd (3) comparison: if a is_i＜a′_iStart run a'_i-a_iAn instance of a function; if a is_i＞a′_iAccording to a after execution time is terminated_i-a′_iRunning functions, and putting the terminated functions into a request pool after reconstructing the functions;

and after the corresponding functions in all the request pools are successfully completed, recording replies received by all the requests, and returning a message of the completion of the current task to the workflow control thread.

The invention provides a completion time optimization-oriented scientific workflow scheduling system, which comprises:

a DAG conversion module: converting a given scientific workflow configuration into a corresponding directed acyclic graph;

the function conversion module: converting each layer of tasks in the scientific workflow into a server-free function and deploying the server-free function into a corresponding cluster;

a resource allocation module: for each layer of tasks in the directed acyclic graph, allocating resources to the tasks and running the tasks according to parameter configuration;

a dynamic adjustment module: and in the process of task operation, the cluster is kept monitored, and the resource allocation of each task is dynamically adjusted according to the residual operation time of the task and the available resources of the system.

Preferably, the DAG conversion module comprises:

extracting each task in the workflow configuration, creating a graph node in a corresponding directed acyclic graph, and specifying the maximum running instance number of each task;

adding corresponding edges in the created directed acyclic graph according to the dependency relationship among the workflow tasks;

generating an ID corresponding to the directed acyclic graph based on the name of the workflow;

marking a root node in the created directed acyclic graph;

and packaging the elements including the ID, the deadline and the directed acyclic graph into a workflow object.

Preferably, the function conversion module includes:

extracting tasks needing conversion at present according to the indexes;

extracting the name and the mirror image address of the task needing to be converted currently;

packing information including the names, mirror addresses, name spaces and automatic expansion rules of the tasks to generate a corresponding server-free function;

the workflow system distributes a pair of input keys and output keys to each server-free function instance according to the input and output specified by the task;

the workflow system writes sets K _ input and K _ output of input files and output files in input keys and output keys corresponding to each serverless function instance;

when no service function is executed, the workflow system downloads the function mirror image, deploys the corresponding server-free function in the cluster according to the mirror image, generates a function running example, and reserves the Restful API interface exposed by the example.

Compared with the prior art, the invention has the following beneficial effects:

1) through the DAG conversion module, when submitting the workflow, a user can operate the workflow only by specifying input, output and mirror images without specifying the resource amount required by the workflow, so that the burden of the user is reduced, and the user experience is improved;

2) the automatic conversion from the workflow task to the server-free function is realized through the function conversion module, so that the usability of a server-free framework and the automatic deployment level of the workflow are improved;

3) the number of function examples is adjusted, and the system can automatically adjust the resources and the number occupied by the functions according to the residual time of the scientific workflow tasks and the resource availability of the clusters, so that the completion time of the workflow is optimized, the resource reuse rate of the clusters is improved, and dual-target optimization is realized;

4) the system is adaptive to various cloud environments, and can be conveniently deployed on various clusters due to the independent underlying structure and the independence of the system on certain specific services, so that the system has good universality.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is an architecture diagram of a workflow system;

FIG. 2 is a schematic diagram of the Shrimp workflow;

fig. 3 is a diagram of the operation process of the workflow system.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

The application provides a technical scheme for operating scientific workflow based on a serverless architecture, wherein the workflow is scheduled under Kubernets and Knative frameworks by using a container technology, so that higher overall completion time and clustering performance are achieved. Therefore, the technical scheme can be used as an execution platform of the scientific workflow to execute various scientific workflows containing heterogeneous tasks.

In the scheme, in order to solve the dependency problem of the workflow tasks and match the workflow tasks with the container arrangement system k8s, the tasks in all the workflows are packaged into docker images, and communication and data transmission are performed in a Restful API mode. One difficulty with this solution is how to balance the resources used between different workflow tasks. The method and the device dynamically adjust the number of functions of different tasks running on a platform by monitoring the CPU and memory utilization rate of all the tasks running at present and the deadline requirement of the workflow to which the tasks belong, so as to achieve the aim of optimizing the workflow completion time and the cluster utilization rate at the same time.

A scientific workflow can be defined as a directed acyclic graph DAG, defined as: g ═ { T, E, Data, D, F }, where T ═ τ₁，τ₂，...τ_nIs a set of tasks, and E is a set of edges that represent the task data dependencies. A dependency ensures that a subtask will not run until all its parent tasks have completed execution and have transmitted completion data. Data represents one or more files required to perform a given task. All tasks on a DAG may be classified into different levels according to the path length from the starting point to the current node. D is the expiration date of the current workflow. For a task in each DAG, at the same level a certain task τ_iThe number of occurrences is denoted F_i。

The architecture of the workflow system of the present invention is shown in fig. 1.

In the invention, when a user runs a workflow, a yaml file and an initial input file for defining the workflow task, the workflow dependency and the workflow data are required to be submitted to a workflow system; after the workflow system receives the files, the following processing steps are completed:

1. converting the scientific workflow task into a server-free function and deploying the server-free function into a corresponding cluster;

2. converting the workflow given by the user into a corresponding DAG according to the workflow;

3. for each layer of task in the DAG, the workflow allocates a proper amount of resources to the task and runs the task according to a plurality of parameters;

4. in the process of task operation, the workflow system can keep monitoring the cluster resources and dynamically adjust the resource allocation of each task.

The conversion of scientific workflow tasks to serverless functions comprises the following steps:

a scientific workflow may contain many tasks with the same function but different inputs, for example, in the Shrimp workflow shown in fig. 2, there are many mappers that receive different inputs to produce their corresponding outputs. Therefore, the task requests of the same type are gathered, the tasks are exposed through the Restful API, the same task images are packed into the same function to process the requests, and the purpose of reducing resource consumption by multiplexing the task functions is achieved.

The specific steps are shown in fig. 3, and include:

1. the workflow system distributes a pair of input keys and output keys to each task according to input and output specified by a user;

2. the workflow system writes a set K of input files and output files in the input keys and the output keys corresponding to each task_inputAnd K_output；

3. And downloading each task image by the workflow system, deploying a corresponding function in the cluster according to the image, and reserving the exposed Restful API interface.

Secondly, requesting to execute a function:

a request corresponds to a specific function instance, namely a request and input and output thereof have an independent function to process the request.

1. Before each request is executed, the workflow system sends the corresponding input Key and the corresponding output Key to each request;

2. when the request is executed, data can be acquired from a specified Redis database according to the input Key sent by the request;

3. requesting to accept data and outputting corresponding output;

4. requesting that the output be sent back to the specified Redis database according to its acquired output Key.

Thirdly, determining the resource amount required by the unit task:

definition of tau_i，kIs the kth task of the ith workflow.

The workflow system allocates the resources of the whole cluster to all running tasks in proportion according to the total function instance number of all the tasks running in the workflow pool and the residual running time of the workflow.

The invention will use the following algorithm to determine the amount of CPU resources and memory resources required for each task single instance:

1. for each task to be run, starting 1 instance of the task in a container, collecting data of CPU resources and memory resources consumed by the container at regular time intervals, and forming two time sequences by the collection results, which are respectively marked as d_cpu＝{d_cpu，1，...，d_cpu，nAnd d_mem＝{d_mem，1，...，d_mem，n}；

2. The current time interval number is recorded as i, and the collected CPU data is recorded as d_cpu，iThe memory data is d_mem，i；

If in a certain time interval i, the following two conditions are simultaneously satisfied:

then the resource consumption of the current task tends to be stable, and d_cpu，iAnd d_mem，iAs an inter-task resource allocation algorithmThe amount of task resource consumption used;

3. if at time interval j (j > i), the CPU data d it collects_cpu，jAnd memory data d_mem，jIf the above two conditions are satisfied, then d is selected for CPU resource data_cpu，iAnd d_cpu，jThe maximum value in (1) is used as the amount of CPU resources for scheduling; processing the memory resource data in the same way;

4. if no time interval i satisfies the above two conditions, then d is taken_cpuAnd d_memAs the amount of resources required for a single instance of the task.

Fourthly, adjusting the total resource amount of a single task:

1. determining the k-th task tau which is running by the current workflow according to the resources needed by the single instance of the task determined in the second step_i，kAmount of CPU resources d of a single container_cpu，i，kAnd amount of memory resources d_mem，i，k；

2. Counting the total available resource CPU of the CPU and the memory of all the nodes of the whole cluster_TotAnd mem_Tot；

3. For each workflow with tasks running, the first task starts to execute at time t_iThe current time is t_curThe current workflow has an expiration date D_iTime factor Δ t of the workflow_iComprises the following steps:

4. the kth task T that workflow i is running_i，kThe new resource number of (2) is calculated according to the following formula:

wherein n represents a workflow with a task runningThe number of the components is equal to or less than the total number of the components,

The resource allocation process is performed once every certain time (the algorithm is set to 10 seconds).

Fifthly, determining the number of function instances of a single task, and comprising the following steps:

1. the number F of instances set according to the task through incoming request links, request parameters and other data_iPutting a request into the request cache pool;

2. according to the resource quantity required by the single task instance determined in the third step and the number of the resources held by the new current task calculated by the algorithm in the fourth step, the monitoring thread calculates the new number of the function instances:

3. number a of function instances currently running_iAnd target example quantity a'_iAnd (3) comparison: if a is_i＜a′_iStart run a'_i-a_iAn instance of a function; if a is_i＞a′_iAccording to a after execution time is terminated_i-a′_iRunning functions, and putting the terminated functions into a request pool after reconstructing the functions;

The invention also provides a completion time optimization-oriented scientific workflow scheduling system, which comprises:

the function conversion module: and converting the scientific workflow tasks into server-free functions and deploying the server-free functions into corresponding clusters.

A DAG conversion module: a given scientific workflow is converted into a corresponding directed acyclic graph.

A resource allocation module: and for each layer of task in the directed acyclic graph, allocating resources to the task and running according to a plurality of parameters.

A dynamic adjustment module: in the process of task operation, the monitoring of the cluster is kept, and the resource allocation of each task is dynamically adjusted

Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A completion time optimization-oriented scientific workflow scheduling method is characterized by comprising the following steps:

2. The completion time optimization-oriented scientific workflow scheduling method of claim 1, wherein the DAG conversion step comprises:

s104, marking a root node in the created directed acyclic graph;

3. The completion time optimization-oriented scientific workflow scheduling method of claim 1, wherein in the function conversion step, the tasks with the same function but different inputs in one scientific workflow are summarized, and are exposed through Restful API, and are packaged into the same serverless function by using the same task image.

4. The completion time optimization-oriented scientific workflow scheduling method according to claim 2, wherein the function conversion step comprises:

s201, extracting a task needing conversion at present according to the index;

5. The completion time optimization-oriented scientific workflow scheduling method of claim 1, wherein said resource allocation step comprises determining the amount of resources required for a unit task:

for each task to be run, starting 1 instance of the task in a container, collecting data of CPU resources and memory resources consumed by the container at regular time intervals, and forming two time sequences by the collection results, which are respectively marked as d_cpu＝{d_cpu，1，…，d_cpu，nAnd d_mem＝{d_memm，1，…，d_mem，n}；

then the resource consumption of the current task tends to be stable, and d_cpu，iAnd d_mem，iThe task resource consumption used in the inter-task resource allocation algorithm;

6. The completion time optimization-oriented scientific workflow scheduling method of claim 5, wherein the total resource allocation method of a single task comprises:

7. The completion time optimization-oriented scientific workflow scheduling method of claim 6, wherein the method for determining the number of function instances of a single task comprises:

number a of function instances currently running_iAnd new function examplesNumber a'_iAnd (3) comparison: if a is_i＜a′_iStart run a'_i-a_iAn instance of a function; if a is_i＞a′_iAccording to a after execution time is terminated_i-a′_iRunning functions, and putting the terminated functions into a request pool after reconstructing the functions;

8. A completion time optimization-oriented scientific workflow scheduling system, comprising:

9. The completion time optimization-oriented scientific workflow scheduling system of claim 8 wherein said DAG conversion module comprises:

marking a root node in the created directed acyclic graph;

10. The completion time optimization-oriented scientific workflow scheduling system of claim 9 wherein said function transformation module comprises:

extracting tasks needing conversion at present according to the indexes;