[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111861412A - Completion time optimization-oriented scientific workflow scheduling method and system - Google Patents

Completion time optimization-oriented scientific workflow scheduling method and system Download PDF

Info

Publication number
CN111861412A
CN111861412A CN202010732161.5A CN202010732161A CN111861412A CN 111861412 A CN111861412 A CN 111861412A CN 202010732161 A CN202010732161 A CN 202010732161A CN 111861412 A CN111861412 A CN 111861412A
Authority
CN
China
Prior art keywords
task
workflow
function
tasks
cpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010732161.5A
Other languages
Chinese (zh)
Other versions
CN111861412B (en
Inventor
钱诗友
周杰
薛广涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010732161.5A priority Critical patent/CN111861412B/en
Publication of CN111861412A publication Critical patent/CN111861412A/en
Application granted granted Critical
Publication of CN111861412B publication Critical patent/CN111861412B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a completion time optimization-oriented scientific workflow scheduling method and a completion time optimization-oriented scientific workflow scheduling system, which comprise the following steps: converting the scientific workflow task into a server-free function and deploying the server-free function into a corresponding cluster; converting a given scientific workflow into a corresponding directed acyclic graph; for each layer of tasks in the directed acyclic graph, allocating resources to the tasks and running the tasks according to parameter configuration; and in the process of task operation, the cluster is kept monitored, and the resource allocation of each task is dynamically adjusted. Compared with the prior art, the invention realizes higher integral completion time and clustering performance by fully utilizing the elastic telescopic capacity provided by the server-free framework.

Description

Completion time optimization-oriented scientific workflow scheduling method and system
Technical Field
The invention relates to the technical field of computers, in particular to a scientific workflow scheduling method and a scientific workflow scheduling system for completion time optimization based on a serverless architecture.
Background
In scientific computing, scientific workflows are a widely used abstraction.
A workflow is typically represented by a Directed Acyclic Graph (DAG) that includes a series of tasks and data input or output within or between the tasks. Constructing a scientific application into a workflow provides a convenient abstraction to represent scientific problems, sinking complex parallel scheduling into workflow scheduling systems.
For scientific workflows, the repeatability of their work is crucial. Multiple dependencies or multiple profiles may occur for a scientific workflow task, and because of these complex dependencies and profiles, there may be dependency conflicts between different workflow tasks. How to deal with these dependency problems is one of the signs of a mature scientific workflow system.
To this end, some existing scientific workflow systems choose to package tasks into containers and then run in container-based clusters. This solves the problem of workflow task dependency, but how to efficiently schedule containers for executing workflow tasks in a cluster remains a challenging problem.
In addition, workflow tasks are of different types, and are generally categorized on a time basis into two categories: a long task or a short task. A large number of known workflow systems may place workflow tasks into preset virtual machines, containers, or customized workflow runs of the virtual machines, the containers, or the customized workflow systems to run, but resource adjustment cannot be performed on the workflow tasks in the process of running of the workflow tasks, so that when a plurality of workflow tasks run simultaneously, imbalance of allocated resources among the workflow tasks may be caused, and the scheduling length of one or more workflows is extended. For scientific workflows with heterogeneous tasks, how to dynamically optimize resource allocation of long tasks or short tasks, the existing system lacks an efficient solution.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a scientific workflow scheduling method and system for completion time optimization.
The scientific workflow scheduling method for completion time optimization provided by the invention comprises the following steps:
a DAG conversion step: converting a given scientific workflow configuration into a corresponding directed acyclic graph;
function conversion step: converting each layer of tasks in the scientific workflow into a server-free function and deploying the server-free function into a corresponding cluster;
resource allocation step: for each layer of tasks in the directed acyclic graph, allocating resources to the tasks and running the tasks according to parameter configuration;
and a dynamic adjustment step: and in the process of task operation, the cluster is kept monitored, and the resource allocation of each task is dynamically adjusted according to the residual operation time of the task and the available resources of the system.
Preferably, the DAG converting step comprises:
s101, extracting each task in workflow configuration, creating a graph node in a corresponding directed acyclic graph, and designating the maximum running instance number of each task;
s102, adding corresponding edges in the created directed acyclic graph according to the dependency relationship among the workflow tasks;
s103, generating an ID corresponding to the directed acyclic graph based on the name of the workflow;
s104, marking a root node in the created directed acyclic graph;
and S105, packaging the elements including the ID, the deadline and the directed acyclic graph into a workflow object.
Preferably, in the function conversion step, the tasks with the same function but different inputs in one scientific workflow are gathered, the tasks are exposed through Restful API, and the same tasks are packaged into the same serverless function by using the same task images.
Preferably, the function conversion step includes:
s201, extracting a task needing conversion at present according to the index;
s202, extracting the name and the mirror image address of the task needing to be converted currently;
s203, packing information including the names, mirror addresses, name spaces and automatic expansion rules of the tasks to generate a corresponding server-free function;
s204, the workflow system distributes a pair of input keys and output keys to each server-free function instance according to the input and output specified by the task;
s205, writing a set K _ input and a set K _ output of an input file and an output file into an input key and an output key corresponding to each serverless function instance by the workflow system;
s206, when no service function is executed, the workflow system downloads the function mirror image, deploys the corresponding server-free function in the cluster according to the mirror image, generates a function running example, and reserves the Restful API interface exposed by the example.
Preferably, the resource allocation step includes determining an amount of resources required for a unit task:
the workflow system distributes the resources of the whole cluster to all running tasks in proportion according to the total function instance number of all the tasks running in the workflow pool, the residual running time of the workflow and the available resources of the current cluster;
the following method is adopted to determine the amount of CPU resources and memory resources required by a single instance of each task:
for each task to be run, starting 1 instance of the task in a container, collecting data of CPU resources and memory resources consumed by the container at regular time intervals, and forming two time sequences by the collection results, which are respectively marked as dcpu={dcpu,1,…,dcpu,nAnd dmem={dmem,1,…,dmem,n};
The sequence number of the current time interval is recorded as i, and the collected CPU data is recorded as dcpu,iThe memory data is dmem,i(ii) a If in a certain time interval i, the following two conditions are simultaneously satisfied:
Figure BDA0002603573360000031
Figure BDA0002603573360000032
then the resource consumption of the current task tends to be stable, and dcpu,iAnd dmem,iAs used in inter-task resource allocation algorithmsThe consumption of task resources;
if at time interval j, j > i, the collected CPU data dcpu,jAnd memory data dmem,jIf the above two conditions are satisfied, then d is selected for CPU resource datacpu,iAnd dcpu,jThe maximum value in (1) is used as the amount of CPU resources for scheduling; processing the memory resource data in the same way;
if no time interval i satisfies the above two conditions, then d is takencpuAnd dmemAs the amount of resources required for a single instance of the task.
Preferably, the total resource allocation method for a single task comprises the following steps:
determining the k-th task tau which is running by the current workflow according to the resources required by the single instance of the taski,kAmount of CPU resources d of a single containercpu,i,kAnd amount of memory resources dmem,i,k
Counting the total available resource CPU of the CPU and the memory of all the nodes of the whole clusterTotAnd memTot
For each scientific workflow with tasks running, the first task starts to execute at time tiThe current time is tcurThe current workflow has an expiration date DiTime factor Δ t of the workflowiComprises the following steps:
Figure BDA0002603573360000033
the kth task T of the ith workflow in operationi,kThe new resource number of (2) is calculated according to the following formula:
Figure BDA0002603573360000034
Figure BDA0002603573360000041
wherein n represents the number of scientific workflows that have tasks running,
Figure BDA0002603573360000042
is the number of instances at the current level of task k that the ith scientific workflow is running,
Figure BDA0002603573360000043
is the number of instances at the current level of task k that the jth scientific workflow is running.
Preferably, the method for determining the number of function instances of a single task comprises:
number of instances F set according to the task by the data of the incoming requestiPutting a request into the request cache pool;
according to the resource quantity required by a single task instance and the number of resources held by the current task, the monitoring thread calculates the new number of function instances:
Figure BDA0002603573360000044
number a of function instances currently runningiAnd a new function instance number a'iAnd (3) comparison: if a isi<a′iStart run a'i-aiAn instance of a function; if a isi>a′iAccording to a after execution time is terminatedi-a′iRunning functions, and putting the terminated functions into a request pool after reconstructing the functions;
and after the corresponding functions in all the request pools are successfully completed, recording replies received by all the requests, and returning a message of the completion of the current task to the workflow control thread.
The invention provides a completion time optimization-oriented scientific workflow scheduling system, which comprises:
a DAG conversion module: converting a given scientific workflow configuration into a corresponding directed acyclic graph;
the function conversion module: converting each layer of tasks in the scientific workflow into a server-free function and deploying the server-free function into a corresponding cluster;
a resource allocation module: for each layer of tasks in the directed acyclic graph, allocating resources to the tasks and running the tasks according to parameter configuration;
a dynamic adjustment module: and in the process of task operation, the cluster is kept monitored, and the resource allocation of each task is dynamically adjusted according to the residual operation time of the task and the available resources of the system.
Preferably, the DAG conversion module comprises:
extracting each task in the workflow configuration, creating a graph node in a corresponding directed acyclic graph, and specifying the maximum running instance number of each task;
adding corresponding edges in the created directed acyclic graph according to the dependency relationship among the workflow tasks;
generating an ID corresponding to the directed acyclic graph based on the name of the workflow;
marking a root node in the created directed acyclic graph;
and packaging the elements including the ID, the deadline and the directed acyclic graph into a workflow object.
Preferably, the function conversion module includes:
extracting tasks needing conversion at present according to the indexes;
extracting the name and the mirror image address of the task needing to be converted currently;
packing information including the names, mirror addresses, name spaces and automatic expansion rules of the tasks to generate a corresponding server-free function;
the workflow system distributes a pair of input keys and output keys to each server-free function instance according to the input and output specified by the task;
the workflow system writes sets K _ input and K _ output of input files and output files in input keys and output keys corresponding to each serverless function instance;
when no service function is executed, the workflow system downloads the function mirror image, deploys the corresponding server-free function in the cluster according to the mirror image, generates a function running example, and reserves the Restful API interface exposed by the example.
Compared with the prior art, the invention has the following beneficial effects:
1) through the DAG conversion module, when submitting the workflow, a user can operate the workflow only by specifying input, output and mirror images without specifying the resource amount required by the workflow, so that the burden of the user is reduced, and the user experience is improved;
2) the automatic conversion from the workflow task to the server-free function is realized through the function conversion module, so that the usability of a server-free framework and the automatic deployment level of the workflow are improved;
3) the number of function examples is adjusted, and the system can automatically adjust the resources and the number occupied by the functions according to the residual time of the scientific workflow tasks and the resource availability of the clusters, so that the completion time of the workflow is optimized, the resource reuse rate of the clusters is improved, and dual-target optimization is realized;
4) the system is adaptive to various cloud environments, and can be conveniently deployed on various clusters due to the independent underlying structure and the independence of the system on certain specific services, so that the system has good universality.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is an architecture diagram of a workflow system;
FIG. 2 is a schematic diagram of the Shrimp workflow;
fig. 3 is a diagram of the operation process of the workflow system.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The application provides a technical scheme for operating scientific workflow based on a serverless architecture, wherein the workflow is scheduled under Kubernets and Knative frameworks by using a container technology, so that higher overall completion time and clustering performance are achieved. Therefore, the technical scheme can be used as an execution platform of the scientific workflow to execute various scientific workflows containing heterogeneous tasks.
In the scheme, in order to solve the dependency problem of the workflow tasks and match the workflow tasks with the container arrangement system k8s, the tasks in all the workflows are packaged into docker images, and communication and data transmission are performed in a Restful API mode. One difficulty with this solution is how to balance the resources used between different workflow tasks. The method and the device dynamically adjust the number of functions of different tasks running on a platform by monitoring the CPU and memory utilization rate of all the tasks running at present and the deadline requirement of the workflow to which the tasks belong, so as to achieve the aim of optimizing the workflow completion time and the cluster utilization rate at the same time.
A scientific workflow can be defined as a directed acyclic graph DAG, defined as: g ═ { T, E, Data, D, F }, where T ═ τ1,τ2,...τnIs a set of tasks, and E is a set of edges that represent the task data dependencies. A dependency ensures that a subtask will not run until all its parent tasks have completed execution and have transmitted completion data. Data represents one or more files required to perform a given task. All tasks on a DAG may be classified into different levels according to the path length from the starting point to the current node. D is the expiration date of the current workflow. For a task in each DAG, at the same level a certain task τiThe number of occurrences is denoted Fi
The architecture of the workflow system of the present invention is shown in fig. 1.
In the invention, when a user runs a workflow, a yaml file and an initial input file for defining the workflow task, the workflow dependency and the workflow data are required to be submitted to a workflow system; after the workflow system receives the files, the following processing steps are completed:
1. converting the scientific workflow task into a server-free function and deploying the server-free function into a corresponding cluster;
2. converting the workflow given by the user into a corresponding DAG according to the workflow;
3. for each layer of task in the DAG, the workflow allocates a proper amount of resources to the task and runs the task according to a plurality of parameters;
4. in the process of task operation, the workflow system can keep monitoring the cluster resources and dynamically adjust the resource allocation of each task.
The conversion of scientific workflow tasks to serverless functions comprises the following steps:
a scientific workflow may contain many tasks with the same function but different inputs, for example, in the Shrimp workflow shown in fig. 2, there are many mappers that receive different inputs to produce their corresponding outputs. Therefore, the task requests of the same type are gathered, the tasks are exposed through the Restful API, the same task images are packed into the same function to process the requests, and the purpose of reducing resource consumption by multiplexing the task functions is achieved.
The specific steps are shown in fig. 3, and include:
1. the workflow system distributes a pair of input keys and output keys to each task according to input and output specified by a user;
2. the workflow system writes a set K of input files and output files in the input keys and the output keys corresponding to each taskinputAnd Koutput
3. And downloading each task image by the workflow system, deploying a corresponding function in the cluster according to the image, and reserving the exposed Restful API interface.
Secondly, requesting to execute a function:
a request corresponds to a specific function instance, namely a request and input and output thereof have an independent function to process the request.
1. Before each request is executed, the workflow system sends the corresponding input Key and the corresponding output Key to each request;
2. when the request is executed, data can be acquired from a specified Redis database according to the input Key sent by the request;
3. requesting to accept data and outputting corresponding output;
4. requesting that the output be sent back to the specified Redis database according to its acquired output Key.
Thirdly, determining the resource amount required by the unit task:
definition of taui,kIs the kth task of the ith workflow.
The workflow system allocates the resources of the whole cluster to all running tasks in proportion according to the total function instance number of all the tasks running in the workflow pool and the residual running time of the workflow.
The invention will use the following algorithm to determine the amount of CPU resources and memory resources required for each task single instance:
1. for each task to be run, starting 1 instance of the task in a container, collecting data of CPU resources and memory resources consumed by the container at regular time intervals, and forming two time sequences by the collection results, which are respectively marked as dcpu={dcpu,1,...,dcpu,nAnd dmem={dmem,1,...,dmem,n};
2. The current time interval number is recorded as i, and the collected CPU data is recorded as dcpu,iThe memory data is dmem,i
If in a certain time interval i, the following two conditions are simultaneously satisfied:
Figure BDA0002603573360000081
Figure BDA0002603573360000082
then the resource consumption of the current task tends to be stable, and dcpu,iAnd dmem,iAs an inter-task resource allocation algorithmThe amount of task resource consumption used;
3. if at time interval j (j > i), the CPU data d it collectscpu,jAnd memory data dmem,jIf the above two conditions are satisfied, then d is selected for CPU resource datacpu,iAnd dcpu,jThe maximum value in (1) is used as the amount of CPU resources for scheduling; processing the memory resource data in the same way;
4. if no time interval i satisfies the above two conditions, then d is takencpuAnd dmemAs the amount of resources required for a single instance of the task.
Fourthly, adjusting the total resource amount of a single task:
1. determining the k-th task tau which is running by the current workflow according to the resources needed by the single instance of the task determined in the second stepi,kAmount of CPU resources d of a single containercpu,i,kAnd amount of memory resources dmem,i,k
2. Counting the total available resource CPU of the CPU and the memory of all the nodes of the whole clusterTotAnd memTot
3. For each workflow with tasks running, the first task starts to execute at time tiThe current time is tcurThe current workflow has an expiration date DiTime factor Δ t of the workflowiComprises the following steps:
Figure BDA0002603573360000083
4. the kth task T that workflow i is runningi,kThe new resource number of (2) is calculated according to the following formula:
Figure BDA0002603573360000084
Figure BDA0002603573360000085
wherein n represents a workflow with a task runningThe number of the components is equal to or less than the total number of the components,
Figure BDA0002603573360000086
is the number of instances at the current level of task k that the ith scientific workflow is running,
Figure BDA0002603573360000087
is the number of instances at the current level of task k that the jth scientific workflow is running.
The resource allocation process is performed once every certain time (the algorithm is set to 10 seconds).
Fifthly, determining the number of function instances of a single task, and comprising the following steps:
1. the number F of instances set according to the task through incoming request links, request parameters and other dataiPutting a request into the request cache pool;
2. according to the resource quantity required by the single task instance determined in the third step and the number of the resources held by the new current task calculated by the algorithm in the fourth step, the monitoring thread calculates the new number of the function instances:
Figure BDA0002603573360000091
3. number a of function instances currently runningiAnd target example quantity a'iAnd (3) comparison: if a isi<a′iStart run a'i-aiAn instance of a function; if a isi>a′iAccording to a after execution time is terminatedi-a′iRunning functions, and putting the terminated functions into a request pool after reconstructing the functions;
and after the corresponding functions in all the request pools are successfully completed, recording replies received by all the requests, and returning a message of the completion of the current task to the workflow control thread.
The invention also provides a completion time optimization-oriented scientific workflow scheduling system, which comprises:
the function conversion module: and converting the scientific workflow tasks into server-free functions and deploying the server-free functions into corresponding clusters.
A DAG conversion module: a given scientific workflow is converted into a corresponding directed acyclic graph.
A resource allocation module: and for each layer of task in the directed acyclic graph, allocating resources to the task and running according to a plurality of parameters.
A dynamic adjustment module: in the process of task operation, the monitoring of the cluster is kept, and the resource allocation of each task is dynamically adjusted
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A completion time optimization-oriented scientific workflow scheduling method is characterized by comprising the following steps:
a DAG conversion step: converting a given scientific workflow configuration into a corresponding directed acyclic graph;
function conversion step: converting each layer of tasks in the scientific workflow into a server-free function and deploying the server-free function into a corresponding cluster;
resource allocation step: for each layer of tasks in the directed acyclic graph, allocating resources to the tasks and running the tasks according to parameter configuration;
and a dynamic adjustment step: and in the process of task operation, the cluster is kept monitored, and the resource allocation of each task is dynamically adjusted according to the residual operation time of the task and the available resources of the system.
2. The completion time optimization-oriented scientific workflow scheduling method of claim 1, wherein the DAG conversion step comprises:
s101, extracting each task in workflow configuration, creating a graph node in a corresponding directed acyclic graph, and designating the maximum running instance number of each task;
s102, adding corresponding edges in the created directed acyclic graph according to the dependency relationship among the workflow tasks;
s103, generating an ID corresponding to the directed acyclic graph based on the name of the workflow;
s104, marking a root node in the created directed acyclic graph;
and S105, packaging the elements including the ID, the deadline and the directed acyclic graph into a workflow object.
3. The completion time optimization-oriented scientific workflow scheduling method of claim 1, wherein in the function conversion step, the tasks with the same function but different inputs in one scientific workflow are summarized, and are exposed through Restful API, and are packaged into the same serverless function by using the same task image.
4. The completion time optimization-oriented scientific workflow scheduling method according to claim 2, wherein the function conversion step comprises:
s201, extracting a task needing conversion at present according to the index;
s202, extracting the name and the mirror image address of the task needing to be converted currently;
s203, packing information including the names, mirror addresses, name spaces and automatic expansion rules of the tasks to generate a corresponding server-free function;
s204, the workflow system distributes a pair of input keys and output keys to each server-free function instance according to the input and output specified by the task;
s205, writing a set K _ input and a set K _ output of an input file and an output file into an input key and an output key corresponding to each serverless function instance by the workflow system;
s206, when no service function is executed, the workflow system downloads the function mirror image, deploys the corresponding server-free function in the cluster according to the mirror image, generates a function running example, and reserves the Restful API interface exposed by the example.
5. The completion time optimization-oriented scientific workflow scheduling method of claim 1, wherein said resource allocation step comprises determining the amount of resources required for a unit task:
the workflow system distributes the resources of the whole cluster to all running tasks in proportion according to the total function instance number of all the tasks running in the workflow pool, the residual running time of the workflow and the available resources of the current cluster;
the following method is adopted to determine the amount of CPU resources and memory resources required by a single instance of each task:
for each task to be run, starting 1 instance of the task in a container, collecting data of CPU resources and memory resources consumed by the container at regular time intervals, and forming two time sequences by the collection results, which are respectively marked as dcpu={dcpu,1,…,dcpu,nAnd dmem={dmemm,1,…,dmem,n};
The sequence number of the current time interval is recorded as i, and the collected CPU data is recorded as dcpu,iThe memory data is dmem,i(ii) a If in a certain time interval i, the following two conditions are simultaneously satisfied:
Figure FDA0002603573350000021
Figure FDA0002603573350000022
then the resource consumption of the current task tends to be stable, and dcpu,iAnd dmem,iThe task resource consumption used in the inter-task resource allocation algorithm;
if at time interval j, j > i, the collected CPU data dcpu,jAnd memory data dmem,jIf the above two conditions are satisfied, then d is selected for CPU resource datacpu,iAnd dcpu,jThe maximum value in (1) is used as the amount of CPU resources for scheduling; processing the memory resource data in the same way;
if no time interval i satisfies the above two conditions, then d is takencpuAnd dmemAs the amount of resources required for a single instance of the task.
6. The completion time optimization-oriented scientific workflow scheduling method of claim 5, wherein the total resource allocation method of a single task comprises:
determining the k-th task tau which is running by the current workflow according to the resources required by the single instance of the taski,kAmount of CPU resources d of a single containercpu,i,kAnd amount of memory resources dmem,i,k
Counting the total available resource CPU of the CPU and the memory of all the nodes of the whole clusterTotAnd memTot
For each scientific workflow with tasks running, the first task starts to execute at time tiThe current time is tcurThe current workflow has an expiration date DiTime factor Δ t of the workflowiComprises the following steps:
Figure FDA0002603573350000031
the kth task T of the ith workflow in operationi,kThe new resource number of (2) is calculated according to the following formula:
Figure FDA0002603573350000032
Figure FDA0002603573350000033
wherein n represents the number of scientific workflows that have tasks running,
Figure FDA0002603573350000034
is the number of instances at the current level of task k that the ith scientific workflow is running,
Figure FDA0002603573350000035
is the number of instances at the current level of task k that the jth scientific workflow is running.
7. The completion time optimization-oriented scientific workflow scheduling method of claim 6, wherein the method for determining the number of function instances of a single task comprises:
number of instances F set according to the task by the data of the incoming requestiPutting a request into the request cache pool;
according to the resource quantity required by a single task instance and the number of resources held by the current task, the monitoring thread calculates the new number of function instances:
Figure FDA0002603573350000036
number a of function instances currently runningiAnd new function examplesNumber a'iAnd (3) comparison: if a isi<a′iStart run a'i-aiAn instance of a function; if a isi>a′iAccording to a after execution time is terminatedi-a′iRunning functions, and putting the terminated functions into a request pool after reconstructing the functions;
and after the corresponding functions in all the request pools are successfully completed, recording replies received by all the requests, and returning a message of the completion of the current task to the workflow control thread.
8. A completion time optimization-oriented scientific workflow scheduling system, comprising:
a DAG conversion module: converting a given scientific workflow configuration into a corresponding directed acyclic graph;
the function conversion module: converting each layer of tasks in the scientific workflow into a server-free function and deploying the server-free function into a corresponding cluster;
a resource allocation module: for each layer of tasks in the directed acyclic graph, allocating resources to the tasks and running the tasks according to parameter configuration;
a dynamic adjustment module: and in the process of task operation, the cluster is kept monitored, and the resource allocation of each task is dynamically adjusted according to the residual operation time of the task and the available resources of the system.
9. The completion time optimization-oriented scientific workflow scheduling system of claim 8 wherein said DAG conversion module comprises:
extracting each task in the workflow configuration, creating a graph node in a corresponding directed acyclic graph, and specifying the maximum running instance number of each task;
adding corresponding edges in the created directed acyclic graph according to the dependency relationship among the workflow tasks;
generating an ID corresponding to the directed acyclic graph based on the name of the workflow;
marking a root node in the created directed acyclic graph;
and packaging the elements including the ID, the deadline and the directed acyclic graph into a workflow object.
10. The completion time optimization-oriented scientific workflow scheduling system of claim 9 wherein said function transformation module comprises:
extracting tasks needing conversion at present according to the indexes;
extracting the name and the mirror image address of the task needing to be converted currently;
packing information including the names, mirror addresses, name spaces and automatic expansion rules of the tasks to generate a corresponding server-free function;
the workflow system distributes a pair of input keys and output keys to each server-free function instance according to the input and output specified by the task;
the workflow system writes sets K _ input and K _ output of input files and output files in input keys and output keys corresponding to each serverless function instance;
when no service function is executed, the workflow system downloads the function mirror image, deploys the corresponding server-free function in the cluster according to the mirror image, generates a function running example, and reserves the Restful API interface exposed by the example.
CN202010732161.5A 2020-07-27 2020-07-27 Completion time optimization-oriented scientific workflow scheduling method and system Active CN111861412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010732161.5A CN111861412B (en) 2020-07-27 2020-07-27 Completion time optimization-oriented scientific workflow scheduling method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010732161.5A CN111861412B (en) 2020-07-27 2020-07-27 Completion time optimization-oriented scientific workflow scheduling method and system

Publications (2)

Publication Number Publication Date
CN111861412A true CN111861412A (en) 2020-10-30
CN111861412B CN111861412B (en) 2024-03-15

Family

ID=72947259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010732161.5A Active CN111861412B (en) 2020-07-27 2020-07-27 Completion time optimization-oriented scientific workflow scheduling method and system

Country Status (1)

Country Link
CN (1) CN111861412B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416476A (en) * 2020-11-25 2021-02-26 武汉联影医疗科技有限公司 Workflow execution method and device, computer equipment and storage medium
CN112748997A (en) * 2021-01-20 2021-05-04 北京明略昭辉科技有限公司 Workflow scheduling method and system
CN113225269A (en) * 2021-04-16 2021-08-06 鹏城实验室 Container-based workflow scheduling method, device and system and storage medium
CN113537721A (en) * 2021-06-21 2021-10-22 华南师范大学 Control method, system, and medium for business workflow local time constraint adjustment
CN114844843A (en) * 2022-03-24 2022-08-02 清华大学 Method and device for adjusting number of application instances
WO2023202006A1 (en) * 2022-04-20 2023-10-26 Zhejiang Dahua Technology Co., Ltd. Systems and methods for task execution
WO2024109005A1 (en) * 2022-11-22 2024-05-30 华为云计算技术有限公司 Execution method for serverless application, and related device
CN118193598A (en) * 2024-05-16 2024-06-14 南京赛宁信息技术有限公司 Cache-based target range application node cold start acceleration method and system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110072436A1 (en) * 2009-09-23 2011-03-24 International Business Machines Corporation Resource optimization for real-time task assignment in multi-process environments
CN102799957A (en) * 2012-05-30 2012-11-28 武汉理工大学 Scientific work flow scheduling method with safe perception under cloud calculation environment
CN107015856A (en) * 2017-03-30 2017-08-04 青海大学 Task scheduling approach generation method and device under cloud environment in scientific workflow
CN108154317A (en) * 2018-01-25 2018-06-12 福建师范大学 The workflow group scheduling method that Case-based Reasoning self-adjusted block is integrated under cloudy environment
CN108665157A (en) * 2018-05-02 2018-10-16 中山大学 A method of realizing cloud Workflow system flow instance balance dispatching
CN109634742A (en) * 2018-11-15 2019-04-16 华南理工大学 A kind of time-constrain scientific workflow optimization method based on ant group algorithm
US20190179678A1 (en) * 2017-12-07 2019-06-13 International Business Machines Corporation Computer server application execution scheduling latency reduction
CN110058950A (en) * 2019-04-17 2019-07-26 上海沄界信息科技有限公司 Distributed cloud computing method and equipment based on serverless backup framework
US20190286486A1 (en) * 2016-09-21 2019-09-19 Accenture Global Solutions Limited Dynamic resource allocation for application containers
CN111427681A (en) * 2020-02-19 2020-07-17 上海交通大学 Real-time task matching scheduling system and method based on resource monitoring in edge computing

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110072436A1 (en) * 2009-09-23 2011-03-24 International Business Machines Corporation Resource optimization for real-time task assignment in multi-process environments
CN102799957A (en) * 2012-05-30 2012-11-28 武汉理工大学 Scientific work flow scheduling method with safe perception under cloud calculation environment
US20190286486A1 (en) * 2016-09-21 2019-09-19 Accenture Global Solutions Limited Dynamic resource allocation for application containers
CN107015856A (en) * 2017-03-30 2017-08-04 青海大学 Task scheduling approach generation method and device under cloud environment in scientific workflow
US20190179678A1 (en) * 2017-12-07 2019-06-13 International Business Machines Corporation Computer server application execution scheduling latency reduction
CN108154317A (en) * 2018-01-25 2018-06-12 福建师范大学 The workflow group scheduling method that Case-based Reasoning self-adjusted block is integrated under cloudy environment
CN108665157A (en) * 2018-05-02 2018-10-16 中山大学 A method of realizing cloud Workflow system flow instance balance dispatching
CN109634742A (en) * 2018-11-15 2019-04-16 华南理工大学 A kind of time-constrain scientific workflow optimization method based on ant group algorithm
CN110058950A (en) * 2019-04-17 2019-07-26 上海沄界信息科技有限公司 Distributed cloud computing method and equipment based on serverless backup framework
CN111427681A (en) * 2020-02-19 2020-07-17 上海交通大学 Real-time task matching scheduling system and method based on resource monitoring in edge computing

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HAMID ARABNEJAD等: "Fairness resource sharing for dynamic workflow scheduling on Heterogeneous Systems", 《2012 10TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS》 *
朱亚东;李忠;严莉;陈湘军;: "基于云环境的科学工作流均衡调度算法", 实验室研究与探索, no. 05 *
董炜航: "科学工作流调度优化问题研究", 《中国优秀硕士学位论文全文数据库 经济与管理科学辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416476A (en) * 2020-11-25 2021-02-26 武汉联影医疗科技有限公司 Workflow execution method and device, computer equipment and storage medium
CN112416476B (en) * 2020-11-25 2023-03-24 武汉联影医疗科技有限公司 Workflow execution method and device, computer equipment and storage medium
CN112748997A (en) * 2021-01-20 2021-05-04 北京明略昭辉科技有限公司 Workflow scheduling method and system
CN113225269A (en) * 2021-04-16 2021-08-06 鹏城实验室 Container-based workflow scheduling method, device and system and storage medium
CN113537721A (en) * 2021-06-21 2021-10-22 华南师范大学 Control method, system, and medium for business workflow local time constraint adjustment
CN114844843A (en) * 2022-03-24 2022-08-02 清华大学 Method and device for adjusting number of application instances
WO2023202006A1 (en) * 2022-04-20 2023-10-26 Zhejiang Dahua Technology Co., Ltd. Systems and methods for task execution
WO2024109005A1 (en) * 2022-11-22 2024-05-30 华为云计算技术有限公司 Execution method for serverless application, and related device
CN118193598A (en) * 2024-05-16 2024-06-14 南京赛宁信息技术有限公司 Cache-based target range application node cold start acceleration method and system
CN118193598B (en) * 2024-05-16 2024-10-01 南京赛宁信息技术有限公司 Cache-based target range application node cold start acceleration method and system

Also Published As

Publication number Publication date
CN111861412B (en) 2024-03-15

Similar Documents

Publication Publication Date Title
CN111861412B (en) Completion time optimization-oriented scientific workflow scheduling method and system
CN107580023B (en) Stream processing job scheduling method and system for dynamically adjusting task allocation
Fohler Joint scheduling of distributed complex periodic and hard aperiodic tasks in statically scheduled systems
Zhu et al. A cost-effective scheduling algorithm for scientific workflows in clouds
CN108154317B (en) Workflow group scheduling method based on example self-adaptive distribution integration in multi-cloud environment
CN112685153A (en) Micro-service scheduling method and device and electronic equipment
CN111798113B (en) Resource allocation method, device, storage medium and electronic equipment
CN111026519B (en) Distributed task priority scheduling method and system and storage medium
Soner et al. Integer programming based heterogeneous cpu–gpu cluster schedulers for slurm resource manager
JP2006244479A (en) System and method for scheduling executable program
CN106201701A (en) A kind of workflow schedule algorithm of band task duplication
CN116302519A (en) Micro-service workflow elastic scheduling method, system and equipment based on container cloud platform
US8028291B2 (en) Method and computer program product for job selection and resource allocation of a massively parallel processor
CN109634714B (en) Intelligent scheduling method and device
CN114816753A (en) Data cluster computing node scaling method, device, equipment and medium
George et al. Job vs. portioned partitioning for the earliest deadline first semi-partitioned scheduling
Isovic et al. Handling mixed sets of tasks in combined offline and online scheduled real-time systems
CN115934362A (en) Deep learning-oriented server non-perception computing cluster scheduling method and product
CN115858667A (en) Method, apparatus, device and storage medium for synchronizing data
WO2022266263A1 (en) Allocating of computing resources for applications
CN114675953A (en) Resource dynamic scheduling method, device, equipment and computer readable storage medium
Chen et al. DeepBoot: Dynamic Scheduling System for Training and Inference Deep Learning Tasks in GPU Cluster
CN116483546B (en) Distributed training task scheduling method, device, equipment and storage medium
Heath et al. Development, analysis, and verification of a parallel hybrid dataflow computer architectural framework and associated load-balancing strategies and algorithms via parallel simulation
Chin et al. Adaptive service scheduling for workflow applications in service-oriented grid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant