CN108920261A

CN108920261A - A kind of two-stage self-adapting dispatching method suitable for large-scale parallel data processing task

Info

Publication number: CN108920261A
Application number: CN201810502295.0A
Authority: CN
Inventors: 顾升高; 刘瑞; 齐俊鹏; 胡泉; 杨越; 孙毅方
Original assignee: China Academy Of Aerospace Systems Science And Engineering
Current assignee: China Academy Of Aerospace Systems Science And Engineering
Priority date: 2018-05-23
Filing date: 2018-05-23
Publication date: 2018-11-30
Anticipated expiration: 2038-05-23
Also published as: CN108920261B

Abstract

The invention discloses a kind of two-stage self-adapting dispatching methods suitable for large-scale parallel data processing task, and task is carried out two-level scheduler from task level and subtask level.The present invention efficiently solves Parallel Scheduling difficult problem brought by complicated dependence between task, improve degree of parallelism, it realizes that large-scale data processing task is orderly, efficient parallel processing, reduces task dispatching and wait for or execute the time, reduce the overall execution time.Furthermore actuator operating statistic information can also be fed back to scheduler by the present invention, realized adaptive adjustment actuator pond size and the affiliated type of task, scheduling is continued to optimize, to improve system resource service efficiency.

Description

A kind of two-stage self-adapting dispatching method suitable for large-scale parallel data processing task

Technical field

The present invention relates to a kind of two-stage self-adapting dispatching methods suitable for large-scale parallel data processing task, belong to data Handle task schedule field.

Background technique

As Internet technology continues to develop, demand of each field to extensive mass data storage and processing constantly increases Add, the requirement to its working efficiency and processing cost also increasingly improves.How large-scale data processing task is reasonably allocated to Multicomputer system improves execution efficiency, pursues the smallest overall execution time, it has also become the problem of current urgent need to resolve.

Traditional common tasks dispatching algorithm, such as prerequisite variable dispatching algorithm, high priority Priority-driven Scheduling Algorithm, time Piece polling dispatching algorithm etc., all there is certain limitations, are not suitable for large-scale data processing task schedule.For example, first Carry out first service dispatch algorithm, although realizing that simply the high task of the requirement of real-time to arrive after will lead to cannot be adjusted in time Degree；High priority Priority-driven Scheduling Algorithm, has ensured the preferential execution of emergency task, but will lead to low priority task waiting simultaneously Overlong time；Timeslice polling dispatching algorithm makes task liberally be scheduled, but with task by switching task context Scale increases, frequent switching task context, and overhead will increase, and will affect integrated scheduling performance.Meanwhile these are general Task scheduling algorithm could not fully consider between data handled by the business particularity and different task of data processing task Strong dependence, so as to cause specific aim deficiency, be unable to satisfy large-scale parallel data processing task timeliness requirement.

Conventional scheduling algorithms are used for reference, expand a large amount of research to this both at home and abroad.China Patent Publication No. CN107196874A, publication date on September 22nd, 2017, invention and created name was a kind of queue scheduling algorithm and system, this application Case discloses queue scheduling algorithm a kind of efficient and with delay performance, is mentioned by introducing dummy delay for high-priority task For Delay Guarantee, by weighed value adjusting mechanism pop-up mission is dispatched in time.This method is not by long task and short task Execute and separate, execute the time it is long task it is scheduled after can long-term occupying system resources, cause to execute the time is short and priority compared with High task cannot execute in time because that need to wait in line, and increase the whole waiting time, reduce system responding ability.

Summary of the invention

Technology of the invention solves the problems, such as：It overcomes the deficiencies of the prior art and provide a kind of suitable for large-scale parallel data The two-stage self-adapting dispatching method of processing task, two, task based access control/subtask level handle task, improve parallel Degree efficiently solves Parallel Scheduling difficult problem brought by complicated dependence between task, realizes large-scale data processing Task is orderly, efficient parallel processing, reduces the overall execution time.

The technical solution of the invention is as follows：A kind of two-stage adaptive scheduling suitable for large-scale parallel data processing task Method includes the following steps：

Step 1 executes following work in task level：

(1.1) task that upper layer application is submitted is placed in corresponding task waiting list by task type, described Business waiting list includes long task waiting list, short task waiting list and instant task waiting list；

(1.2) scheduler is taken from task waiting list by the actuator number of dependence and current idle between task It is currently able to executing for task out, is distributed to corresponding actuator pond, the actuator pond includes long task performer pond, short Business actuator pond and instant task performer pond；

(1.3) received task is distributed to the actuator of current idle by each type of actuator pond；

Step 2 executes following work into subtask level：

(2.1) received task is split as subtask by actuator, and subtask is sequentially assigned to multi-threaded parallel Processing；

(2.2) after the completion of task execution, task status is labeled as to execute completion, and put it into and task data is completed In library.

Upper layer application also needs the association attributes of incidentally upper task, including task names, task class while the task of submission Type, type of service, predecessor task list, task context and subtask second level chained list；

Wherein, task type includes long task, short task and instant task three classes, wherein long task refers to that upper layer application is pre- Estimate required by task and execute the task that the time is greater than execution time division threshold value, short task refers to that upper layer application is estimated required by task and held The row time is less than the executing time division threshold value of the task；

Type of service refers to the business function of task；

What is listed in predecessor task list is whole predecessor tasks that current task is directly relied on, only in predecessor task Current task can just execute after being finished；

Subtask second level chained list records the two-dimensional topology of subtask execution sequence, includes m layers of chained list, and every layer of chained list includes more A node, each node include a subtask, and the subtask on every layer of chained list executes parallel, and all sons of preceding layer chained list are appointed After business is performed both by, into next layer of chained list, m is the natural number more than or equal to 1；

Task context refers to the environment of task run, including required data resource, computer resource when operation.

The implementation method of the step (1.2) is as follows：

(3.1) task one enters instant task queue immediately, and scheduler is drawn off being put into instant task performer immediately Pond；

(3.2) for non-instant task, scheduler is executed according to idle in task type cyclic query actuation means pond Device number obtains qualified task using sliding window scheduling mode from task waiting list, submits to corresponding hold Row device pond.

The implementation method of the step (3.2) is as follows：

Step 1: inquiring actuation means pond free time actuator number at this time according to task type, it is assumed that actuation means Pond free time actuator number is N, then the value of scheduling window size W is set as N；

It is chosen in task location pointed by pointer P Step 2: scheduling window initial position is placed in task；

Step 3: the task that obtains chooses task pointed by pointer P, inquiry if W, which is greater than in 0 and scheduling window, task The predecessor task list of the task checks the state of each predecessor task in predecessor task list；If otherwise W be 0 or window in Without task, execution step 6 is jumped；

Step 4: judging whether all predecessor tasks have been finished, if so, taking out from task waiting list should Task, for task status labeled as " plan executes ", scheduling window size is adjusted to W=W-1, enters step five；If it is not, entering step Rapid five；

Step 5: task chooses pointer P and scheduling window is slided into together in the lower task location of queue, jumps and execute step Rapid three；

It is executed Step 6: all taking out from step 4 for tasks are submitted to actuation means pond, enters step after the completion Rapid seven；

Step 7: judging whether timing time t has reached chooses pointer backtracking time T, the timing time t is in scheduler Task selection pointer P is directed toward task waiting list initial position if reaching by the timing since 0 when starting again, when timing Between t reset to 0, reclocking jumps execution step 1；If not up to, jumping execution step 1；Wherein pointer recalls time T It is greater than 5* and executes time division threshold value.

The method of inspection predecessor task state is in the step 3：According to the task type of predecessor task, first right It answers and is scanned in task waiting list, if finding the task, which is to be not carried out state；If not finding this Business, then inquire in assignment database is completed, if also not finding the task in database, illustrate that upper layer application is not submitted also Task or task are carrying out, i.e., predecessor task is to be not carried out state；If finding, predecessor task is the state that is finished.

The detailed process of the step (2.1) is as follows：

Step 1: actuator extracts corresponding subtask second level chained list when receiving task, subtask two-stage chain is taken out First layer chained list pointed by pointer field in gauge outfit head, as current chained list；

Step 2: judging whether current chained list is null, if it is empty, terminate process；If not empty, current linked list head is taken out The subtask of the node is distributed to an independent thread and executed by node pointed by middle pointer field as present node；

Step 3: next node pointed by pointer field in present node is taken, if it is empty, to all subtasks of current chained list After being performed both by, step 4 is executed；If not empty, using the next node as present node, by present node subtask point Dispensing one independent thread executes, and then proceedes to execute step 3；

Step 4: taking out next chained list pointed by current chain table pointer domain, if it is empty, illustrate that task execution is completed, knot Beam；If not empty, it as current chained list, jumps to step 2 and continues to execute.

While scheduling and processing, actuator executive condition is acquired in real time, feeds back to scheduler, scheduler evidence after statistics Task waiting list and actuator pond actuator quantity where this adjustment task, realize adaptive adjustment.

The method of task waiting list is as follows where adjustment task：

Each task execution the time it takes in long task performer pond and short task performer pond is acquired, industry is then pressed The task that service type counts every kind of type of service averagely executes duration, feeds back to scheduler, and scheduler judges according to feedback information Whether task type division is appropriate, if inappropriate, modifies task type, is adjusted in new task waiting list.

The method for adjusting actuator pond actuator quantity is as follows：

Idle actuator number, timely feedbacks to tune in the long task performer pond of periodic statistics and short task performer pond Device is spent, scheduler inquires task situation in each task waiting list at this time, and redistributes two actuators according to query result Idle actuator is released to the not enough actuator pond of actuator by pond actuator quantity.

Compared with prior art, the present invention has the advantages that：

(1) the present invention is based on two, task/subtask levels to carry out task schedule and processing, has further refined scheduling Granularity, can more fully utilize system resource, improve degree of parallelism, efficiently solve between task brought by complicated dependence simultaneously Row scheduling difficult problem, realizes that large-scale data processing task is orderly, efficient parallel processing, when reducing overall execution Between.

(2) present invention fully considers dependence and service logic sequence between task in task schedule and treatment process, Solve the problems such as running sequencing and multiple subtasks between the task there are dependence while running conflict very well, The scheduling for realizing task rational and orderly avoids resource using confusion, improves task treatment effeciency.

(3) present invention is adaptively adjusted, Optimized Operation using result statistics and feedback mechanism, realizes system resource Dynamically, reasonable distribution can cope with business variation in time, have high flexibility.

Detailed description of the invention

Fig. 1 is overall construction drawing of the present invention；

Fig. 2 is task association attributes schematic diagram；

Fig. 3 is sliding window dispatching algorithm flow chart；

Fig. 4 is second level list structure schematic diagram；

Fig. 5 is that actuator executes mission flow diagram；

Fig. 6 is that short task action result counts and feed back schematic diagram；

Fig. 7 is task schedule process schematic.

Specific embodiment

In order to make large-scale data processing task orderly, efficient parallel execute, the method for the present invention is by task from two levels It is handled, it is intended to improve parallel amount to greatest extent, reduce task dispatching and wait for or execute the time：The first order, task level, each Task all states its predecessor task relied on, and scheduler establishes topology accordingly, it is ensured that task is executed according to order is relied on, no dependence The task of relationship can execute parallel；The second level, subtask level are appointed task by movement or function division at a series of son It is engaged in, data, resource needed for each subtask are loaded by first order task layer, and the purpose for dividing subtask is to further increase parallel Degree by No Assets conflict, is distributed to multiple threads without the associated subtask of execution sequence and is performed simultaneously.

Data processing task refers to the tasks such as data acquisition, data mart modeling, data fusion, data analysis, the usual table of feature Now for task scale is big, the execution time is long, relation of interdependence is complicated between task.According to task execution time length, setting is held The row time divides threshold value, and the task of different service types is divided into three types：Long task, short task, instant task, such as： The tasks such as mass data collection, data depth analysis, data mart modeling conversion can be classified as long task, control, communication, interaction etc. are appointed Business is classified as short task, and the high task of requirement of real-time is classified as instant task.It may be led in order to avoid long task execution time is too long Cause short task or instant task that cannot slowly execute, the present invention be adopted as long task, short task, instant task establish respectively appoint It is engaged in waiting list and actuator pond, these three types of tasks separately being dispatched and being handled.It is read to further increase big data task Efficiency is write, magnanimity scale task situation is coped with, each task waiting list task is stored in Redis memory database, Suo Youyi The task of completion is finally stored in HBase database, is convenient for subsequent query.

It is as shown in Figure 1 overall construction drawing of the invention.The present invention data processing task is placed in by type first it is long, In short and instant three task waiting lists, then executed by dependence rational management between task to actuator by scheduler； It is split as subtask according to service logic when actuator executes task, and subtask is sequentially assigned at multi-threaded parallel Reason；While scheduling and processing, the mechanism progress resource-adaptive adjustment for counting and feeding back is acquired using operation information.Method It is described in detail as follows：

(1) task receives

For upper layer application while the task of submission, incidentally the association attributes of upper task, constitute basic letter as shown in Figure 2 Breath, essential information mainly include：Task names, task type, type of service, predecessor task list, task context and son are appointed Business second level chained list.Wherein, task type refers to that upper layer application is divided into three classes according to the required by task execution time is estimated：Long task, Short task, instant task；Type of service refers to that upper layer application classifies to task by business function, such as：Data acquisition session, Data analysis task, data mart modeling task dispatching, are mainly used for Information Statistics；What is listed in predecessor task list is current task institute The whole predecessor tasks directly relied on, only current task can just execute after predecessor task is finished；

Each node includes a subtask in the second level chained list of subtask, and second level list structure has recorded subtask Execution sequence two-dimensional topology, the subtask in second level chained list on every layer of chained list can execute parallel, only when preceding layer is all Subtask is performed both by the next straton task of end and can just execute.For the situation shown in Fig. 4, task presses subtask 1, son originally Task 2 is sequentially executed up to subtask 9, but subtask 1, subtask 2,3 No Assets access conflict of subtask and execution are successive Relationship, therefore place it in same layer chained list and execute parallel, subtask 5, which executes, relies on subtask 4 as a result, therefore being placed on not In same layer chained list, and subtask 6 executes and does not depend on subtask 4 and subtask 5, therefore is advanced to and 4 same layer of subtask.

The method of the present invention creates a globally unique identifier after receiving task, for it, then submits upper layer application Task be placed in corresponding task waiting list by task type, wait scheduling.

For example, it is long task, number that upper layer application when submitting the big data analysis task of a 1T, need to mark task simultaneously According to analysis generic task, and list predecessor task list, subtask second level chained list and task context.The method of the present invention will be created for it Unique identification is built, and is placed on long task dispatching and waits for queue tail, it is " ready " that its state, which is arranged,.

(2) task schedule

The scheduling that task is completed by scheduler uses different scheduling plans to the task in different task waiting list Slightly：For non-instant task (long task and short task), the task quantity taken out every time from task waiting list is executed with corresponding Device pond current idle actuator number is related；For instant task, after guaranteeing that its real-time executed, each task arrive It takes out and is put into instant task performer pond immediately.

Specifically, for non-instant task, idle actuator number in scheduler cyclic query actuation means pond, from appointing It is engaged in obtaining qualified task using sliding window scheduling mode in waiting list, submits to actuation means pond.Such as Fig. 3 institute Show, task schedule basic procedure is：

Step 4: judging whether all predecessor tasks have been finished, if so, taking out from task waiting list should Task, for task status labeled as " plan executes ", scheduling window size is adjusted to W=W-1, enters step five；If it is not, scheduling window Mouth size remains unchanged, and enters step five；

It is entered step Step 6: all taking out from step 4 for tasks are submitted to after the completion of actuation means pond executes Seven；

Wherein, when dispatching algorithm initializes, task is chosen into pointer P and is directed toward queue initial position.Pointer is chosen in definition Recall time T, to ensure that not the having execution condition before in queue of the task can reschedule in a reasonable time period, i.e., After timing time t reaches selection pointer backtracking time T, task chooses pointer P and is directed toward queue head again, and initialization timing t is set It is 0.

Check that predecessor task status method is in step 3：According to predecessor task type, team is waited in corresponding task dispatching first It is scanned in column by the task unique identification, if finding the task, i.e., predecessor task is to be not carried out state；If nothing should in queue Task then continues to be completed in assignment database in HBase to inquire.If also not finding the task in HBase database, illustrate Upper layer application does not also submit task or task to be carrying out, i.e., predecessor task is to be not carried out state；If finding, i.e. predecessor task For the state of being finished.

It for instant task, is scheduled using event trigger mechanism, to guarantee that task is dispatched in time.Task is being put into i.e. When task task dispatching when queue, trigger event takes the message of task to scheduler dispatches, scheduler with will the task take out, Instant task performer pond is submitted to, instant task performer pond size with no restriction, can distribute an execution for each task Device executes.

For the situation shown in Fig. 7, scheduling process is explained：

Step 1: idle actuator number N=3, task choose pointer P and are directed toward task 1 at this time for inquiry；

Step 2: setting scheduling window W=3, is placed in pointer P for scheduling window and is directed toward pointed position；

Step 3: obtaining the predecessor task list of task 1, predecessor task is task 3, checks that task 3 executes state, hair Now it is ready state, therefore task 1 does not have execution condition, and scheduling window is slided backward one, pointer P direction task 2；

Step 4: obtaining the predecessor task list of task 2, it is found that it, without predecessor task, task 2 is taken out, task shape State adjusts scheduling window size W=2 labeled as " plan executes ", and window is slided backward one, and pointer P, which is directed toward, to be appointed Business 3；

Step 5: obtaining the predecessor task list of task 3, it is found that it, without predecessor task, task 3 is taken out, task shape State adjusts scheduling window size W=1 labeled as " plan executes ", and window is slided backward one, and pointer P, which is directed toward, to be appointed Business 4；

Step 6: obtaining the predecessor task list of task 4, predecessor task is task 2, checks that task 2 executes state, hair Existing its is plan execution state, therefore task 4 does not have execution condition, and scheduling window is slided backward one, pointer P direction task 5；

Step 7: obtaining the predecessor task list of task 5, it is found that it, without predecessor task, task 5 is taken out, task shape State adjusts scheduling window size W=0 labeled as " plan executes ", and pointer P is directed toward task 6；

Step 8: discovery scheduling window size is 0, illustrates to have been taken out 3 tasks, these three tasks are submitted into execution Device pond executes, this finishing scheduling, dispatches and continues from the task 6 that pointer P is directed toward next time.

(3) task execution

Three long task, short task, instant task actuator ponds are established, each actuator pond respectively possesses a certain number of Actuator, the actuator number that instant task performer pond possesses with no restriction, after receiving a collection of task every time, create corresponding number The actuator of amount is handled；The actuator number in non-instant task performer pond is preset, and with operation can adaptively into Row adjustment.Actuator pond receives the carrying out child scheduler distribution of the task, is responsible for the specific execution of task by actuator.In order to improve Execution efficiency, actuator use multi-threaded parallel mode, are handled by minimum unit of subtask.

For actuator after acquisition task, the fixed sequence of subtask second level chained list middle finger carried according to task takes out one every time Layer chained list distributes a thread for subtask each on this layer of chained list and carrys out parallel processing, all hold until to this layer of all subtasks Row terminates, and reprocesses next layer of chained list, and Fig. 4 is second level list structure schematic diagram.

For the second level chained list shown in Fig. 4, it is as shown in Figure 5 that task specifically executes process：

Step 1: taking out chained list chain1 pointed by sub pointer in second level linked list head head, takes out on chain1 and own Subtask (subtask 1, subtask 2, subtask 3) is distributed separate threads for each subtask and is executed, and when recording current Between T1；

Step 2: retried immediately if subtask 1 executes failure, if repeatedly retrying unsuccessful, directly terminate this Business executes, and task is placed back in task dispatching and waits for queue tail.Otherwise, it is all executed to subtask 1, subtask 2, subtask 3 After, the next layer of chained list chain2 of chain1 is taken out, all subtasks on chain2 are taken out, for each subtask distribution one A separate threads execute；

Step 3: taking out the next layer of chained list chain3 of chain2 after subtask 4 and subtask 6 are all finished, taking All subtasks on chain3 out are distributed a separate threads for each subtask and are executed；

Step 4: taking out the next layer of chained list chain4 of chain3 after subtask 5 is finished, institute on chain4 is taken out There is subtask, distributes a separate threads for each subtask and execute；

Step 5: finding chain4 without next layer of chained list, explanation after subtask 7, subtask 8, subtask 9 are finished Task execution finishes, and task status is labeled as to execute completion, and be stored in HBase database.Time T2 at this time is recorded, is obtained To the task execution time.

This method takes extremely task execution subtask grade to retry strategy to realize fine granularity control, that is, finds a certain son After task execution failure, the subtask is retried, if terminating the task execution, again by the task if repeatedly retrying still unsuccessful It is placed into corresponding task dispatching and waits for that queue tail waiting reschedules.

(4) adaptive adjustment

In order to it is abundant, rationally utilize system resource, avoid in actuator pond that the distribution of actuator quantity is uneven, task type is drawn It is improper to divide, and improves operational efficiency to greatest extent, this method acquisition actuator executive condition is simultaneously for statistical analysis, by statistical information Scheduler is fed back to adaptively to be adjusted.Fig. 6 illustrates the process adaptively adjusted by taking short task as an example.Adaptive adjustment master It is divided into two aspects：

One, upper layer application divides task type and divides wrong, the method for the present invention long task of monitoring and short task in order to prevent Each task execution the time it takes in actuator pond, when then averagely being executed by type of service statistics various businesses task It is long, it timely feedbacks to scheduler.Scheduler judges whether task classification is appropriate according to statistical information, if improper, adjustment is appointed Task waiting list where business.For example, data analysis task is set as short task by upper layer application, but through statistics discovery data analysis Task has been more than the execution time to divide threshold value, has illustrated that the task should be long task a length of 2 hours when averagely executing.Scheduler Then short task is appointed and is moved to long job waiting list tail portion to data analysis type tasks all in queue.

Two, idle actuator number in the long task of periodic statistics and short task performer pond, timely feedbacks to scheduler, Scheduler inquires each task waiting list task situation at this time, and redistributes two actuator pond actuators according to statistical result Idle actuator is released to the not enough actuator pond of actuator by quantity.

For the situation shown in Fig. 6, result statistics and feedback mechanism are illustrated.Currently there are tri- kinds of types of service of A, B, C Type of service A, B is set short task by data, upper layer application, and type of service C is set as long task.Scheduler is by short task dispatching It is taken out to task qualified in queue, gives actuator execution.Each task execution time is monitored, then statistical service type A, the task average performance times T of B_a、T_b, find T_a<T_{Divide threshold value}、T_b>T_{Divide threshold value}, illustrate the type of service of type of service B not just Really, situation is fed back into scheduler, the task that all types of service in short task waiting list are B is moved to length by scheduler Job waiting list tail portion, and be long task by these task flaggings.

Meanwhile idle actuator number in the long task of periodic statistics and short task performer pond, feed back to scheduler.With For the long excessive situation of task free time actuator：The idle actuator number F of long task₁, task quantity in task waiting list For Q₁, the idle actuator number F of short task₂, task quantity is Q in task waiting list₂.Defining idleness is idle execute The ratio of device number F and task quantity Q in task waiting list, if the idleness difference of long task and short task is more than threshold value, And Q₁、Q₂Simultaneously it is not 0, then illustrates current actuator quantity unreasonable distribution, in order to realize load balancing, by long task execution N actuator is adjusted to short task performer pond in device pond, and n size is calculated by following formula：

In formula, floor () is downward bracket function, and requires Q₁+Q₂≠0。

In addition, operation result and statistical result can be submitted on demand the present invention also provides friendly external interface service Upper layer application, for it with reference to use.

The content that description in the present invention is not described in detail belongs to the well-known technique of professional and technical personnel in the field.

Claims

1. a kind of two-stage self-adapting dispatching method suitable for large-scale parallel data processing task, it is characterised in that including walking as follows Suddenly：

Step 1 executes following work in task level：

(1.1) task that upper layer application is submitted is placed in corresponding task waiting list by task type, the task dispatching It include long task waiting list, short task waiting list and instant task waiting list to queue；

(1.2) scheduler is taken out from task waiting list and is worked as by the actuator number of dependence and current idle between task Before being able to carry out of the task, be distributed to corresponding actuator pond, the actuator pond includes that long task performer pond, short task are held Row device pond and instant task performer pond；

Step 2 executes following work into subtask level：

(2.1) received task is split as subtask by actuator, and subtask is sequentially assigned to multi-threading parallel process；

(2.2) after the completion of task execution, task status is labeled as to execute completion, and put it into and assignment database is completed In.

2. a kind of two-stage self-adapting dispatching method suitable for large-scale parallel data processing task according to claim 1, special Sign is：Upper layer application also needs the association attributes of incidentally upper task, including task names, task class while the task of submission Type, type of service, predecessor task list, task context and subtask second level chained list；

Wherein, task type includes long task, short task and instant task three classes, is appointed wherein long task refers to that upper layer application is estimated The business required execution time is greater than the execution time the dividing threshold value of the task, when short task refers to that upper layer application estimates required by task execution Between be less than execute the time divide threshold value task；

Type of service refers to the business function of task；

What is listed in predecessor task list is whole predecessor tasks that current task is directly relied on, and is only executed in predecessor task After current task can just execute；

Subtask second level chained list records the two-dimensional topology of subtask execution sequence, includes m layers of chained list, and every layer of chained list includes multiple sections Point, each node include a subtask, and the subtask on every layer of chained list executes parallel, and all subtasks of preceding layer chained list are equal After execution, into next layer of chained list, m is the natural number more than or equal to 1；

3. a kind of two-stage self-adapting dispatching method suitable for large-scale parallel data processing task according to claim 1, special Sign is：The implementation method of the step (1.2) is as follows：

(3.1) task one enters instant task queue immediately, and scheduler is drawn off being put into instant task performer pond immediately；

(3.2) for non-instant task, scheduler is according to actuator idle in task type cyclic query actuation means pond Number obtains qualified task using sliding window scheduling mode from task waiting list, submits to corresponding actuator Pond.

4. a kind of two-stage self-adapting dispatching method suitable for large-scale parallel data processing task according to claim 3, special Sign is：The implementation method of the step (3.2) is as follows：

Step 1: inquiring actuation means pond free time actuator number at this time according to task type, it is assumed that actuation means pond is empty Not busy actuator number is N, then the value of scheduling window size W is set as N；

Step 3: obtaining task if W, which is greater than in 0 and scheduling window, task and choosing task pointed by pointer P, inquire this The predecessor task list of business checks the state of each predecessor task in predecessor task list；If otherwise W be 0 or window in without times Business, jumps execution step 6；

Step 4: judging whether all predecessor tasks have been finished, if so, taking out this from task waiting list Business, for task status labeled as " plan executes ", scheduling window size is adjusted to W=W-1, enters step five；If it is not, entering step Five；

Step 5: task chooses pointer P and scheduling window is slided into together in the lower task location of queue, execution step is jumped Three；

It is executed Step 6: all taking out from step 4 for tasks are submitted to actuation means pond, enters step seven after the completion；

Step 7: judging whether timing time t has reached chooses pointer backtracking time T, the timing time t in scheduler starting When the timing since 0, if reaching, by task selection pointer P be directed toward task waiting list initial position, timing time t again 0 is reset to, reclocking jumps execution step 1；If not up to, jumping execution step 1；Wherein pointer backtracking time T is big Time division threshold value is executed in 5*.

5. a kind of two-stage self-adapting dispatching method suitable for large-scale parallel data processing task according to claim 4, special Sign is：The method of inspection predecessor task state is in the step 3：According to the task type of predecessor task, first in correspondence It is scanned in task waiting list, if finding the task, which is to be not carried out state；If not finding the task, It is then inquired in assignment database is completed, if also not finding the task in database, illustrates that upper layer application is not submitted also and appoint Business or task are carrying out, i.e., predecessor task is to be not carried out state；If finding, predecessor task is the state that is finished.

6. a kind of two-stage self-adapting dispatching method suitable for large-scale parallel data processing task according to claim 1, special Sign is：The detailed process of the step (2.1) is as follows：

Step 1: actuator extracts corresponding subtask second level chained list when receiving task, subtask second level linked list head is taken out First layer chained list pointed by pointer field in head, as current chained list；

Step 2: judging whether current chained list is null, if it is empty, terminate process；If not empty, current linked list head middle finger is taken out The subtask of the node is distributed to an independent thread and executed by node pointed by needle domain as present node；

Step 3: taking next node pointed by pointer field in present node, if it is empty, held to all subtasks of current chained list After row, step 4 is executed；If not empty, using the next node as present node, present node subtask is distributed to One independent thread executes, and then proceedes to execute step 3；

Step 4: taking out next chained list pointed by current chain table pointer domain, if it is empty, illustrates that task execution is completed, terminate；If It is not sky, as current chained list, jumps to step 2 and continue to execute.

7. a kind of two-stage self-adapting dispatching method suitable for large-scale parallel data processing task according to claim 1, It is characterized in that：While scheduling and processing, actuator executive condition is acquired in real time, feeds back to scheduler after statistics, is dispatched Task waiting list and actuator pond actuator quantity where device adjusts task accordingly realize adaptive adjustment.

8. a kind of two-stage self-adapting dispatching method suitable for large-scale parallel data processing task according to claim 7, It is characterized in that：The method of task waiting list is as follows where adjustment task：

Each task execution the time it takes in long task performer pond and short task performer pond is acquired, service class is then pressed The task that type counts every kind of type of service averagely executes duration, feeds back to scheduler, and scheduler judges task according to feedback information Whether Type division is appropriate, if inappropriate, modifies task type, is adjusted in new task waiting list.

9. a kind of two-stage self-adapting dispatching method suitable for large-scale parallel data processing task according to claim 7, It is characterized in that：The method for adjusting actuator pond actuator quantity is as follows：

Idle actuator number, timely feedbacks to scheduling in the long task performer pond of periodic statistics and short task performer pond Device, scheduler inquires task situation in each task waiting list at this time, and redistributes two actuator ponds according to query result Idle actuator is released to the not enough actuator pond of actuator by actuator quantity.