CN110287245A - Method and system for scheduling and executing distributed ETL (extract transform load) tasks - Google Patents
Method and system for scheduling and executing distributed ETL (extract transform load) tasks Download PDFInfo
- Publication number
- CN110287245A CN110287245A CN201910401322.XA CN201910401322A CN110287245A CN 110287245 A CN110287245 A CN 110287245A CN 201910401322 A CN201910401322 A CN 201910401322A CN 110287245 A CN110287245 A CN 110287245A
- Authority
- CN
- China
- Prior art keywords
- etl task
- etl
- execution
- task
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 239000000284 extract Substances 0.000 title claims abstract description 9
- 230000004044 response Effects 0.000 claims description 13
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000012544 monitoring process Methods 0.000 claims description 3
- 241000238876 Acari Species 0.000 claims description 2
- 230000010354 integration Effects 0.000 abstract description 8
- 238000004422 calculation algorithm Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 10
- 230000000712 assembly Effects 0.000 description 5
- 238000000429 assembly Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000004744 fabric Substances 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 238000011158 quantitative evaluation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer And Data Communications (AREA)
- Multi Processors (AREA)
Abstract
The embodiment of the invention provides a method and a system for scheduling and executing distributed ETL tasks, which extract the association between an entity and an affiliated table, the association between the entity and a dimension table and the one-to-many association between the entity and the entity involved in the ETL task from an acquired target table contained in the ETL task to be scheduled and executed; determining the scheduling priority of the ETL task based on the preset weight for each association and the number of each association in the ETL task; and distributing each ETL task to each execution node according to the sequence from high to low of the scheduling priority. In the technical scheme of the embodiment of the invention, the ETL tasks are distributed to the execution nodes according to different weights based on factors such as the complexity of the service corresponding to the ETL tasks, the importance degree of the service data to be integrated and the like, so that the timeliness of core data loading and the load balance among the nodes are met, and the efficiency of data integration and the utilization rate of resources are improved.
Description
Technical field
The present invention relates to data warehouses, the more particularly, to method and system of ETL task schedule execution.
Background technique
Currently, persistence architecture loading technique (Extract-Transform-Load, ETL) is structure under big data environment
One of the committed step for building data warehouse is that the data of dispersion, isomery are integrated into unified standard by extracting, converting, load
The process in library.The extraction of data, conversion, load step can be combined into a schedulable ETL script operation and (be referred to as
ETL task).Under big data environment, it is often necessary to execute tens of or even tens of thousands of a ETL tasks, how efficient scheduling this
A little tasks are to construct the important component of data warehouse.ETL is mainly carried out using distributed type assemblies scheduling scheme at present to appoint
Business scheduling, is distributed ETL task using the dispatching algorithm of such as polling algorithm, prerequisite variable algorithm, Min-Min algorithm etc
Each execution node into cluster.However since each ETL task execution time is different, data volume contained by task is different, each
The problems such as execution node present load difference etc., be easy to cause cluster resource load imbalance, and resource utilization is low, so as to cause
Data integration inefficiency.
Summary of the invention
Through inventor the study found that when carrying out data integration, business and associated traffic data that different ETL tasks are related to
Importance it is different, if the ETL task for being related to integrated service relevant to core business data is waited when dispatching and executing
Between it is too long, will have a direct impact on the efficiency of data integration.And there is no consider and ETL task pair for existing ETL method for scheduling task
The importance of the complexity for the business answered and business datum to be integrated.Therefore, the embodiment of the present invention aims to overcome that
The defect of the above-mentioned prior art provides a kind of new method and system executed for distribution ETL task schedule.
Above-mentioned purpose is achieved through the following technical solutions:
According to a first aspect of the embodiments of the present invention, a kind of method executed for distributed ETL task schedule is provided,
This method comprises: each ETL task of the execution to be dispatched for acquisition, based on the object table that data in the ETL task load,
Extract involved in the ETL task being associated between entity and attached table, being associated between entity and dimension table, entity and entity
Between one-to-many association;It is based upon the preset weight of every kind of association and every kind of number being associated in the ETL task is determined and is somebody's turn to do
The dispatching priority of ETL task;And each ETL task is distributed to each execution according to the order of dispatching priority from high to low
Node.
In some embodiments of the invention, it before this method may additionally include the distribution for carrying out ETL task, inquires each
Execute the performance indicator of node;And the current negative of each execution node is determined according to the performance indicator of each execution node of acquisition
It carries, carries out the distribution of ETL task from node is accordingly executed down to high selection according to the present load for executing node.
In some embodiments of the invention, the dispatching priority of the ETL task can be calculated by following formula:
Wherein, the associated weight between Wl1 presentation-entity and attached table;Pass between Wl2 presentation-entity and dimension table
The weight of connection;Associated weight between Wl3 presentation-entity and entity;Wherein ni indicates i-th kind occurred in ETL task association
Number.
In some embodiments of the invention, described distribute each ETL task to each execution node may include:
A) data volume of each ETL task of execution to be dispatched is counted;
B) each total amount of data for executing all ETL tasks on node is counted;
C) the corresponding ETL task of maximum amount of data is selected from the ETL task of execution to be dispatched;
D) selection total amount of data is minimum and is currently assigned the execution node of ETL task not yet;
E) selected ETL task is distributed to selected execution node, and is to have distributed by the execution vertex ticks;
F) step c)-e is repeated) it finishes or until the ETL task of execution to be dispatched is assigned until all execution nodes
It is all marked as having distributed;
G) the ETL task for needing to be dispatched execution is detected whether, if so, then re-flagging all execution nodes for not
Distribution repeats step c)-g) it is finished until the ETL task of execution to be dispatched is assigned.
In some embodiments of the invention, this method can also include: to receive new ETL in response to executing node
The pending ETL task is stored in task buffer queue, and records the arrival time of the ETL by business;Based on the ETL task
In data volume estimate execution time of the ETL task;Current task in response to executing node is finished, for wait hold
Capable each ETL task determines that the execution of the ETL task is excellent according to the waiting time of the ETL task and the execution time estimated
First grade;And the highest ETL task of execution priority is selected from pending ETL task to execute.
In some embodiments of the invention, when the execution for estimating the ETL task based on the data volume in the ETL task
Between can include: determine the data volume in the ETL task;The ETL of execution is completed in nearest a period of time from the execution node
In task, the ETL task that a batch has set of metadata of similar data amount with pending ETL task is filtered out;When the execution of this batch of ETL task
Between average, using obtained average value as the execution time for the ETL task estimated.
In some embodiments of the invention, the execution priority of ETL task can use following formula to determine:
Wherein EPiIndicate the execution priority of i-th of ETL task ei;Tei indicates the execution time of ETL task ei;
Twi indicates the waiting time of ETL task ei, is equal to current time and subtracts the time that the ETL task reaches execution node.
According to a second aspect of the embodiments of the present invention, additionally provide it is a kind of for distributed ETL task schedule execute be
System, including scheduler and multiple actuators, scheduler is for distributing one or more ETL tasks of execution to be dispatched to multiple
Actuator, actuator is for executing the ETL task received.Wherein scheduler includes relationship analysis module, Priority Determination module
And scheduler module.Relationship analysis module is used for each ETL task of the execution to be dispatched for acquisition, based in the ETL task
The object table of data load, extracts being associated with, between entity and dimension table between entity and attached table involved in the ETL task
Association, one-to-many between entity and entity be associated with.Priority Determination module is for being based upon every kind of preset weight of association
The number being associated in the ETL task with every kind determines the dispatching priority of the ETL task.Scheduler module is used for excellent according to dispatching
The order of first grade from high to low distributes each ETL task to each actuator.
In some embodiments of the invention, it is fast to may also include load monitoring mould for the scheduler, for inquiring each hold
The performance indicator of row device, and determine according to the performance indicator of each actuator of acquisition the present load of each actuator;And
The scheduler module is also configured to appoint according to the present load of actuator from down to high selection respective actuators to carry out ETL
The distribution of business.
In some embodiments of the invention, the actuator can be configured as: in response to receiving new ETL task,
The pending ETL task is stored in task buffer queue, and records the arrival time of the ETL;Based in the ETL task
Data volume estimates execution time of the ETL task;It is finished in response to current task, pending each ETL is appointed
Business determines the execution priority of the ETL task according to the waiting time of the ETL task and the execution time estimated;And to
The highest ETL task of execution priority is selected in the ETL task of execution to execute.
The technical solution of the embodiment of the present invention can include the following benefits:
According to the complexity of business corresponding with ETL task, significance level, the joint behavior of business datum to be integrated etc.
Factor carries out ETL task distribution among the nodes, and can also on each execution node according to ETL task execution time and
The adjustment such as data volume to be processed ETL task executes sequence, has both met timeliness that core data loads and each has executed section
Load equilibrium between point, and the efficiency of data integration and the utilization rate of resource are improved on the whole.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
It can the limitation present invention.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention
Example, and be used to explain the principle of the present invention together with specification.It should be evident that the accompanying drawings in the following description is only the present invention
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.In the accompanying drawings:
Fig. 1 shows the process of the method according to an embodiment of the invention executed for distribution ETL task schedule
Schematic diagram.
Fig. 2 shows the process schematics of determining ETL task weight according to an embodiment of the invention.
Fig. 3 shows ETL task execution process schematic on execution node according to an embodiment of the invention.
Fig. 4 shows the structure of the system according to an embodiment of the invention executed for distribution ETL task schedule
Schematic diagram.
Specific embodiment
In order to make the purpose of the present invention, technical solution and advantage are more clearly understood, and are passed through below in conjunction with attached drawing specific real
Applying example, the present invention is described in more detail.It should be appreciated that described embodiment is a part of the embodiments of the present invention, without
It is whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not in the case where making creative work
The every other embodiment obtained, shall fall within the protection scope of the present invention.
In addition, described feature, structure or characteristic can be incorporated in one or more implementations in any suitable manner
In example.In the following description, many details are provided to provide and fully understand to the embodiment of the present invention.However,
It will be appreciated by persons skilled in the art that technical solution of the present invention can be practiced without one or more in specific detail,
Or it can be using other methods, constituent element, device, step etc..In other cases, it is not shown in detail or describes known side
Method, device, realization or operation are to avoid fuzzy each aspect of the present invention.
Block diagram shown in the drawings is only functional entity, not necessarily must be corresponding with physically separate entity.
I.e., it is possible to realize these functional entitys using software form, or realized in one or more hardware modules or integrated circuit
These functional entitys, or these functional entitys are realized in heterogeneous networks and/or processor device and/or microcontroller device.
Flow chart shown in the drawings is merely illustrative, it is not necessary to including all content and operation/step,
It is not required to execute by described sequence.For example, some operation/steps can also decompose, and some operation/steps can close
And or part merge, therefore the sequence actually executed is possible to change according to the actual situation.
Fig. 1 shows the process of the method according to an embodiment of the invention executed for distribution ETL task schedule
Schematic diagram.As shown in Figure 1, this method specifically includes that step S101) for acquisition execution to be dispatched each ETL task, base
The object table that data load in the ETL task extracts involved in the ETL task being associated between entity and attached table, reality
Being associated between body and dimension table, being associated between entity and entity;Step S102) be based upon every kind of preset weight of association and
Every kind of number being associated in the ETL task determines the dispatching priority of the ETL task;Step S103) according to dispatching priority
Order from high to low distributes each ETL task to the execution node for executing ETL task.
More specifically, multiple ETL of execution to be dispatched can be obtained from ETL task resource library first in step S101)
Task.After the building of ETL task is completed, it will usually store ETL task relevant information to ETL task in the form of metadata and provide
In the library of source.These metadata include the description of the metadata such as name, filename, catalogue, state, description, the extended description of ETL task
Information.Whether the state of ETL task may be used to indicate ETL task and has been scheduled execution, and specific value can be according to actual schedule
Executive condition is configured or changes, such as the ETL task status for the execution that has been scheduled usually may be configured as 1, and also not
The scheduled ETL task status executed usually may be configured as 0.It in one embodiment, can be according to the state of each ETL task
The ETL task of execution to be dispatched is obtained from ETL task resource library with creation time.It can be obtained by the state of ETL task
Know whether ETL task waits scheduled execution, while according to the creation time of ETL task, the waiting of the available ETL task
Time.In this way, a batch can be selected not adjusted from ETL task resource library by waiting time length when being scheduled every time
Spend the ETL task executed.The acquisition of ETL task can be based on request response mechanism or periodically carry out.For example, can week
ETL task resource library is read to phase property, a collection of ETL task of execution to be dispatched therefrom is extracted.The period can be according to practical feelings
Condition is configured or changes, for example, may be configured as 2 hours, 1 hour, 0.5 hour, 10 minutes etc..
It is usually all with various for the relationship between the description and entity and entity of entity in data warehouse
The form of table embody.It, mainly will be from each point when the extraction for being carried out data using ETL task is converted and loaded
Data required for the data source of cloth extracts, are loaded into the object table of setting after converting to it.Lead in each ETL task
It often can be comprising one or more object tables, for example including the object table that entity and its attribute are described, between entity one
The object table that many-many relationship is described, the object table that many-to-many relationship between entity is described (are referred to as attached
Table).In addition, being generally held in dimension table about all possible value of entity attributes, therefore ETL in data warehouse
It can will generally also be arranged with one or more dimension tables of the entity associated in task when loading the related data of designated entities
It is loaded for object table.The entity class that the relevant ETL task of core business is often related to is relatively more, and between entity
Relationship it is also more complicated and diversified.In an embodiment of the present invention, pass through entity involved in ETL task and entity and reality
Various incidence relations between body measure the significance level that the ETL task corresponds to business, and are thus arranged the scheduling of ETL task
Priority (is referred to as weight).
It in step S101), is getting after the ETL task for dispatching execution, can added based on data in each ETL task
The object table of load extracts involved in the ETL task being associated between entity and attached table, the pass between entity and dimension table
It is one-to-many between connection, entity and entity to be associated with, and count every kind of number being associated in the ETL task.For example, passing through traversal
Object table in ETL task can count on multiple entities that the ETL is related to and can determine simultaneously between each entity
Relationship (relationship including one-to-many relationship and multi-to-multi).Wherein for two entities with many-to-many relationship, the two it
Between the corresponding relationship of multi-to-multi be usually stored in attached table in the form of data record, the two entities all with the attached table
Association.When counting the associated number between entity and attached table, each entity is required to count primary.For having
Two entities of many-one relationship can directly determine the pass that the two entities are interrelated, between statistics entity and entity
When the number of connection, the two entities are only needed to count once.Each entity is also possible to have multiple attributes, and dimension table is used for
The all possible value of every attribute is saved, therefore can determine the entity and which based on the entity attributes occurred in object table
A or which dimension table is associated, and when counting the associated number between entity and dimension table, each dimension table is needed
It counts primary.
Being associated with, between entity and dimension table between entity and attached table involved in each ETL task is being determined
After associated number between association, entity and entity, in step S102), the ETL of each execution to be dispatched can be appointed
Business, is based upon every kind of preset weight of association and every kind of number being associated in the ETL task determines that the scheduling of the ETL task is excellent
First grade or weight.For example, can be calculated by following formula (1):
Wherein, the associated weight between Wl1 presentation-entity and attached table;Pass between Wl2 presentation-entity and dimension table
The weight of connection;Associated weight between Wl3 presentation-entity and entity;Wherein ni indicates i-th kind occurred in ETL task association
Number, the number of Wli also is understood as in formula, i is natural number.Wl1, Wl2, Wl3 are according to specific business demand
The preset weight of situation, these weight value ranges differ usually between 2-10, and the value of these weights can root
Change according to business and is accordingly changed.
Below by taking scientific and technological management data integration business as an example, above-mentioned ETL task weight is illustrated.Fig. 2 gives
The process schematic of determining ETL task weight according to an embodiment of the present invention is gone out.As shown in Fig. 2, the ETL task includes four
Entity: project, project, unit, personnel, when constructing the ETL task, in the object table to be loaded of setting, project and project
Between be one-to-many relationship (in figure with " 1.n " instruction), i.e., can have multiple projects for a project, but for each class
Topic is only capable of a corresponding project, cannot belong to two projects simultaneously.And between project and personnel, between project and unit, class
It is all the relationship of multi-to-multi between topic and unit, between project and personnel, for example, same personnel can participate in multiple projects simultaneously
With multiple projects, same unit can correspond to multiple projects and multiple projects.It is for each entity setting up in the ETL task
A dimension table is loaded respectively.It can be counted in the ETL task from Fig. 2, be associated with 8 between entity and attached table,
4 are associated between entity and dimension table;1 is associated between entity and entity, it is assumed that Wl1, Wl2, Wl3 distinguish assignment
It is 6,5,10, then can correspondingly determines the weight of the ETL task are as follows:
Continue to refer to figure 1, step S103) it is treated according to the dispatching priority of each ETL task determined through step S102)
The ETL task that scheduling executes is ranked up.Such as new a batch ETL task weight is respectively { 2,6,8,4,10,3,9 }, then passes through
ETL task sequence after sequence is { 10,9,8,6,4,3,2 }.It obtains being arranged successively from big to small according to weight after sorted
ETL task sequence, then ETL task is distributed on each execution node into distributed environment according to such order and is held
Row.
In this embodiment, by the object table for including in the ETL task extract in the ETL task entity and attached table it
Between association, being associated between entity and dimension table, being associated between entity and entity, for the corresponding business of ETL task
Complexity, the significance level of business datum to be integrated carried out effective quantitative evaluation, form by weight size sort
The scheduled task sequence of optimal expectation, be able to satisfy the timeliness load demand of core business data, improve data integration
Efficiency.
It can also include obtaining each performance indicator for executing node, Yi Jiji in step S103) in another embodiment
In each performance indicator for executing node, the ETL task of execution to be dispatched is distributed to each execution node.This is because will
When ETL task schedule is assigned to each execution node under distributed environment, the current different nodes that execute run number of tasks and task
Contained data volume is different, that is, synchronization each execution node performance and present load be different, if can root
Rationally to control according to the performance for executing node and distribute to each task quantity for executing node, not only can guarantee that each execution node is
To the load balancing of entire distributed environment, and the efficiency of task execution can also be improved on the whole.Therefore, in step
S103) before distributing ETL task, each performance indicator for executing node can be first inquired, according to each execution node of acquisition
Performance indicator for it is each execute node present load carry out grade classification, according to execute node present load from down to
Height sequence selects corresponding execution node to carry out the distribution of ETL task.Wherein each present load for executing node can basis
The performance indicator of the acquired execution node determines, for example, it is assumed that refer to using CPU usage, memory usage as performance
Be designated as example, can following formula (2) determine the present load for executing node:
Wherein, C is the CPU usage for executing node;R is the memory usage for executing node, and L instruction executes working as node
Preceding load, L is bigger, then indicates that the present load for executing node is smaller;L is smaller, then indicates that the present load for executing node is bigger.
Therefore it can be arranged from big to small according to the value of L to obtain each preferential assigned sequence for executing node.In yet another embodiment,
The present load for executing node, such as L=w1*C+w2*R can also be determined by the weighted average of each performance indicator, wherein
W1 and w2 is the weight set for performance indicator C and R, and value is between 0-1.L is bigger, then indicates to execute the current negative of node
It carries bigger;L is smaller, then indicates that the present load for executing node is smaller.Therefore it can arrange from small to large according to the value of L to obtain
Each preferential assigned sequence for executing node.It should be understood that using CPU usage, memory usage as performance indicator determines node
Present load be merely illustrative of, rather than carry out any restriction, those skilled in the art can adjust according to actual needs
Whole or modification.
It In yet another embodiment, can also be according to identified each present load for executing node come to each execution node
Classify, for example, by using formula above (2) determine L by each executions node be divided into high load node, middle load node, it is low bear
Carry node:
It is namely three groups by the execution node division in distributed type assemblies environment, each group by zero to multiple nodes
Composition, with the node member of group, their load capacity is similar.Execution node in low-load node, load capacity is low, currently may be used
Receive an assignment again execution ability it is most strong.It is preferentially considered as ETL task schedule being assigned in low-load actuator node.If
The group of low-load node is sky, then distributes the group that ETL task is formed to middle load node, and so on.As it is above-mentioned it is low, in
Load node is all sky, then illustrates that all execution node present loads are all very high in entire distributed environment.If all execution
Node is divided into the group of high load node for a long time, then needs that alarm mechanism is arranged to prompt distributed environment to be chronically at height
The case where load, so that the performance of distributed environment is promoted or increased respective actuators node by prompt system administrative staff
Quantity, can load capacity come improve entire distributed environment with this.
In the scheme of above-described embodiment, successively selected to execute node progress ETL task distribution by low-load to high load,
So that the high ETL task of dispatching priority is preferentially assigned on the low execution node of present load and executes, not only contribute to respectively hold
Load balancing between row node can also improve the execution efficiency of ETL task.
In yet another embodiment, step S103) can the data volume based on ETL task will participate in scheduling ETL
Task is distributed to each execution node.The total amount of data that different ETL tasks is related to is different, correspondingly the execution time of ETL task
Also different, if these are held by one or several execution nodes are distributed in the big multiple ETL task-sets of task data amount
The waiting time of ETL task can be elongated on row node, and each resource for executing node can not keep effective balanced utilization.Cause
This introduces the reference factor of the data volume of ETL task as distribution in this embodiment, using greedy balanced algorithm come into
The distribution of row ETL task.Assuming that each execution node best correlation is identical under distributed type assemblies, and each node is ok
It works independently, that is, does not need the auxiliary of other nodes, E={ e1, e2, e3..., en } indicates a batch newly obtained wait participate in dispatching
Mutually independent ETL set of tasks, wherein share n ETL task, ei expression i-th of task;D=d1, d2, d3...,
Dn } indicate the set of data volume contained by n ETL task, wherein di is data volume contained by i-th task ei;N=n1, n2,
N3...nj } indicate the set that node is executed in distributed type assemblies, total j node, wherein ni is i-th of actuator node,
dnipreIndicate have data volume contained by ETL task, dni on i-th of execution node niaftIt indicates the after task is assigned
Data volume contained by all ETL tasks on i execution node ni, it is all to participate in contained by the ETL tasks of execution executed on nodes
Total amount of data isThe optimal expectation of i-th of execution node distributes task data amount OptiIt may be expressed as:
The variance of the data volume calculated by following formula indicates to execute the data payload index μ of nodei, then hold for i-th
The data payload index μ of row device node niiIt may be expressed as:
μi=(dniaft-dnipre-Opti)2 (5)
The data payload index μ that node totality is executed in distributed type assemblies may be expressed as:
In the distribution procedure of ETL task, to guarantee that the data load balance of cluster resource, i.e. μ are relatively small as far as possible.It can
The maximum value of μ in task distribution procedure is limited by defining threshold value δ, if μ is more than δ, then it is assumed that the node data loads very
Weight, can not receive new task.The value for calculating μ in real time in task distribution procedure in this way, selects μ every timeiThe node of < δ divides
With task, to guarantee cluster resource load balancing.In one example, main based on greedy balanced algorithm distribution ETL task
Include the following steps:
(1) ETL set of tasks E={ e1, e2, e3..., en } is initialized, data duration set D=contained by ETL task d1,
D2, d3..., dn }, actuator node set N={ n1, n2, n3...nj };
(2) ETL task is ranked up from big to small according to data volume, is deposited into queue Q, Q={ q1,q2,q3,
q4,...qn, wherein q1For (e1,d1),q2For (e2,d2),...qnFor (en,dn),d1≥d2≥dn;
(3) it calculates in real time and executes all data payload index μ for executing node in node set1,μ2,μ3,...μj;According to
Data payload index replacement node sequence is from small to large, to reach following effect: if any μ1< μ2< μ3< ... < μj, then adjust
Node sequence is n1,n2,n3,...,nj;
(4) by μiThe node number of < δ is assigned to variable K, indicates the node number that can this time distribute execution;If K
=0, then show distributed environment load too high at this time, need to temporarily cease task continues to distribute or add new execution node;
(5) for n task in Q, if n > K, K task is taken out, successively assigns to K node, n=n-K;Otherwise such as
0 < n≤K of fruit then takes out n execution node before whole tasks are successively distributed to, such as e1It is distributed to n1,e2It is distributed to n2.If
N≤0 then illustrates that this batch of all tasks are performed both by and finishes that algorithm terminates, and otherwise executes (3).
In yet another embodiment, step S103) it may include each performance indicator for executing node of a) acquisition, and according to
Each performance indicator for executing node determines each present load for executing node;B) it will be distributed based on each present load for executing node
It is three groups: high load node group, middle load node group, low-load node group that node division is executed in formula environment;C) exist first
Low-load node group carries out task distribution, in the data volume and low-load node group for counting each ETL task of execution to be dispatched
It is each to execute the data volume for having ETL task on node, using greedy balanced algorithm described above in low-load node group
Execution node distribute task;If the group of low-load node is empty and there are also ETL task needs to be allocated, utilize above
The greedy balanced algorithm of introduction continues to distribute in remaining ETL task to the execution node in middle load node group, and so on.
If above-mentioned low, middle load node is all sky, then illustrate that all execution node present loads are all very high in entire distributed environment, also
Alarm mechanism can be set come the case where prompting distributed environment to be chronically at high load, so that prompt system administrative staff will divide
The performance of cloth environment is promoted or is increased the quantity of respective actuators node, can come improve entire distributed environment with this
Load capacity.In yet another embodiment, when task distributes failure, if the purpose that failure cause is distribution executes node institute
It is caused, it can be set no longer to be executed whithin a period of time (Penalty time) to the task execution of the execution node and request distribution behaviour
Make.In this way, the failure rate of task distribution can be reduced to a certain extent.
After ETL task to be assigned to each execution node, each actuator node has an execution queue to be responsible for storage
Task, each task occupy a thread resources in the queue.The data volume difference as contained by ETL task causes to hold accordingly
The row time is different.In yet another embodiment, ETL task is improved by balancing execution time and the waiting time of ETL task
Execution efficiency, to improve the efficiency of data integration in entire distributed environment indirectly.In this embodiment, it is based on ETL
The execution priority of ETL task is arranged in the execution time and waiting time of task, so that executing node holding according to ETL task
The sequence of row major grade from high to low executes ETL task, and the execution priority of set ETL task can be held with it
Row time and waiting time and be constantly adjusted.Below with reference to Fig. 3 for the execution of the ETL task on a certain execution node
Cheng Zhankai narration.
As shown in figure 3, the process mainly includes step S301) in response to execute node receive new ETL task, by this
ETL task is stored in task buffer queue, and records the arrival time of the ETL.Step S302) based on the number in the ETL task
The execution time of the ETL task is estimated according to amount.Data volume involved in the ETL task is obtained first, then from the execution section
It is completed in nearest a period of time on point in the ETL task of execution, filtering out a batch and pending ETL task has similarity number
According to the ETL task of amount, execution time of the ETL task selected by these estimates the execution for the ETL task that this has not carried out
Time, such as average estimating as the execution time to pending ETL task to the execution time of this batch of ETL task
Meter.Step S303) it is finished in response to executing the current task of node, for pending each ETL task, according to this
The waiting time of ETL task and the execution time estimated determine the execution priority of the ETL task.Assuming that executing on node at present
Have that n ETL task is etc. pending, setting Tei indicate i-th of ETL task ei the execution time (according to contained data volume into
Row estimation), Twi indicates the waiting time of i-th of ETL task ei, then target of the n ETL task when executing on executing node
Function TotalTime may be expressed as:
It is to execute the mistake of ETL task in the execution node by the purpose that priority adjusts the execution order of ETL task
Cheng Zhong as guarantees the time spent by entire execution process minimum (i.e. TotalTime is minimum) as possible, that is, makes Tei, Twi
Reach relative equilibrium as far as possible.In embodiment, for each ETL task pending in task buffer queue, by step
Rapid S302) waiting time of the task execution time estimated and the ETL task calculates the execution priority of the ETL task.Example
Such as, the execution priority EP of i-th of ETL task ei is determined using following formulai:
Wherein Tti indicates that task ei is reached and executes node time, can use in step S302) based in ETL task
Between the task execution that data volume is estimated;And the waiting time Twi of each ETL task can be calculated by following mode:
Twi=Tni-Tti, that is, each task waiting time be equal to current time subtract the ETL task reach execute node when
Between.From formula (8) it can be seen that EPiIt centainly is greater than 1, when mono- timing of Twi, Tei is smaller, priority EPiIt is higher, it is similar short
Job priority algorithm;When mono- timing of Tei, Twi is bigger, priority EPiIt is higher, similar prerequisite variable algorithm;As Twi and Tei
When all in the state that can not be determined, the setting of this priority combine execute current task executive condition and task on node etc.
To the time, reach the relative equilibrium of ETL task execution time and waiting time on the whole.With continued reference to Fig. 3, in step S304)
The highest ETL task of execution priority is selected from pending ETL task to execute.
Fig. 4 is the structural representation according to the system of one embodiment of the invention executed for distribution ETL task schedule
Figure.As shown in figure 4, the System Scheduler 401 and multiple actuator 402a-n (being referred to as 402), scheduler 401 is from ETL task
Resources bank obtains the ETL task of one or more execution to be dispatched, and is distributed into distributed environment on multiple actuators
It is executed.Actuator 402 is for executing the ETL task received.Although the block diagram describes group in functionally separated mode
Part, but such description is exclusively for the purposes of illustration.Component shown in figure can arbitrarily be combined or be divided into independence
Software, firmware and/or hardware component.Moreover, no matter how such component is combined or divided, they can be
It is executed on same computing device or multiple computing devices, plurality of computing device can be to be connected to the network by one or more.
Wherein scheduler 401 includes relationship analysis module, Priority Determination module, scheduler module.Relationship analysis module is used
The ETL is extracted based on the object table that data in the ETL task load in each ETL task of the execution to be dispatched for acquisition
Being associated between entity and attached table involved in task, being associated with, is a pair of of between entity and entity between entity and dimension table
More associations;Priority Determination module is associated in the ETL task for being based upon the preset weight of every kind of association and every kind
Number determines the dispatching priority of the ETL task;Scheduler module, for will be each according to the order of dispatching priority from high to low
ETL task is distributed to each actuator 402.
In yet another embodiment, scheduler 401 can also include that load monitoring mould is fast, for inquiring each actuator
Performance indicator, and determine according to the performance indicator of each actuator of acquisition the present load of each actuator.Wherein dispatch mould
Block can be additionally configured to according to the present load of actuator from down to high selection respective actuators come carry out ETL task point
Match.In yet another embodiment, actuator 402 can be configured as in response to receiving new ETL task, this is pending
ETL task is stored in task buffer queue, and records the arrival time of the ETL;It is estimated based on the data volume in the ETL task
The execution time of the ETL task;It is finished in response to current task, for pending each ETL task, as explained above
As the execution priority of the ETL task is determined according to waiting time of the ETL task and the execution time estimated;From wait hold
The highest ETL task of execution priority is selected in capable ETL task to execute.
In yet another embodiment of the present invention, a kind of computer readable storage medium is additionally provided, meter is stored thereon with
Calculation machine program or executable instruction, when the computer program or executable instruction are performed realization such as institute in previous embodiment
The technical solution stated, realization principle is similar, and details are not described herein again.In an embodiment of the present invention, computer-readable storage medium
Matter can be it is any can storing data and can by computing device read tangible medium.The reality of computer readable storage medium
Example include hard disk drive, network attached storage (NAS), read-only memory, random access memory, CD-ROM, CD-R,
CD-RW, tape and other optics or non-optical data storage device.Computer readable storage medium also may include being distributed in
Computer-readable medium in network coupled computer system, so as to store and execute computer program in a distributed manner or refer to
It enables.
For the ginseng of " each embodiment ", " some embodiments ", " one embodiment " or " embodiment " etc. in this specification
Examine reference is that the special characteristic in conjunction with described in the embodiment, structure or property are included at least one embodiment.Cause
This, phrase " in various embodiments ", " in some embodiments ", " in one embodiment " or " in embodiment " etc. exists
The appearance of each place not necessarily refers to identical embodiment in the whole instruction.In addition, special characteristic, structure or property can
To combine in any way as suitable in one or more embodiments.Therefore, in conjunction with shown in one embodiment or description
Special characteristic, structure or property can wholly or partly with the feature, structure or property of one or more other embodiments
It unlimitedly combines, as long as the combination is not non-logicality or cannot work.
The term of " comprising " and " having " and similar meaning is expressed in this specification, it is intended that covers non-exclusive packet
Contain, such as contains the process, method, system, product or equipment of a series of steps or units and be not limited to listed step
Rapid or unit, but optionally further comprising the step of not listing or unit, or optionally further comprising for these processes, side
Other intrinsic step or units of method, product or equipment."a" or "an" is also not excluded for multiple situations.In addition, the application
Each element in attached drawing is not necessarily drawn to scale just to schematically illustrate.
Although the present invention is described through the foregoing embodiment, the present invention is not limited to described here
Embodiment, without departing from the present invention further include made various changes and variation.
Claims (10)
1. a kind of method executed for distributed ETL task schedule, comprising:
For each ETL task of the execution to be dispatched of acquisition, based on the object table that data in the ETL task load, extracting should
Being associated between entity and attached table, being associated with, one between entity and entity between entity and dimension table involved in ETL task
To more associations;
It is based upon every kind of preset weight of association and every kind of number being associated in the ETL task determines the scheduling of the ETL task
Priority;
Each ETL task is distributed to each execution node according to the order of dispatching priority from high to low.
2. according to the method described in claim 1, further including inquiring each execution node before the distribution for carrying out ETL task
Performance indicator;And each present load for executing node is determined according to the performance indicator of each execution node of acquisition, according to
The present load for executing node carries out the distribution of ETL task from node is accordingly executed down to high selection.
3. according to the method described in claim 1, wherein the dispatching priority of the ETL task is calculated by following formula:
Wherein, the associated weight between Wl1 presentation-entity and attached table;It is associated between Wl2 presentation-entity and dimension table
Weight;Associated weight between Wl3 presentation-entity and entity;Wherein ni indicates i-th kind occurred in ETL task associated
Number.
4. according to the method described in claim 2, described distribute each ETL task to each execution node includes:
A) data volume of each ETL task of execution to be dispatched is counted;
B) each total amount of data for executing all ETL tasks on node is counted;
C) the corresponding ETL task of maximum amount of data is selected from the ETL task of execution to be dispatched;
D) selection total amount of data is minimum and is currently assigned the execution node of ETL task not yet;
E) selected ETL task is distributed to selected execution node, and is to have distributed by the execution vertex ticks;
F) repeat step c)-e) until execution to be dispatched ETL task be assigned finish or until all execution nodes all by
Labeled as having distributed;
G) the ETL task for needing to be dispatched execution is detected whether, if so, then re-flagging all execution nodes not divide
Match, repeat step c)-g) it is finished until the ETL task of execution to be dispatched is assigned.
5. according to the method described in claim 1, further include:
New ETL task is received in response to executing node, which is stored in task buffer queue, and remembers
Record the arrival time of the ETL;
The execution time of the ETL task is estimated based on the data volume in the ETL task;
Current task in response to executing node is finished, for pending each ETL task, according to the ETL task
Waiting time and the execution time estimated determine the execution priority of the ETL task;
The highest ETL task of execution priority is selected from pending ETL task to execute.
6. according to the method described in claim 5, wherein estimating holding for the ETL task based on the data volume in the ETL task
The row time includes:
Determine the data volume in the ETL task;
From the ETL task that execution is completed in a period of time nearest on the execution node, a batch and pending ETL are filtered out
Task has the ETL task of set of metadata of similar data amount;
The execution time of this batch of ETL task averages, using obtained average value as the execution for the ETL task estimated
Time.
7. according to the method described in claim 6, wherein the execution priority of ETL task is determined using following formula:
Wherein EPiIndicate the execution priority of i-th of ETL task ei;Tei indicates the execution time of ETL task ei;Twi is indicated
The waiting time of ETL task ei is equal to current time and subtracts the time that the ETL task reaches execution node.
8. a kind of system executed for distributed ETL task schedule, including scheduler and multiple actuators, scheduler is used for will
One or more ETL tasks of execution to be dispatched are distributed to multiple actuators, and actuator is for executing the ETL task received;
Wherein scheduler includes:
Relationship analysis module is added for each ETL task of the execution to be dispatched for acquisition based on data in the ETL task
The object table of load extracts involved in the ETL task being associated between entity and attached table, the pass between entity and dimension table
It is one-to-many between connection, entity and entity to be associated with;
Priority Determination module, for being based upon every kind of preset weight of association and every kind of number being associated in the ETL task
Determine the dispatching priority of the ETL task;
Scheduler module, for distributing each ETL task to each actuator according to the order of dispatching priority from high to low.
9. system according to claim 8, wherein the scheduler further includes that load monitoring mould is fast, for inquiring each hold
The performance indicator of row device, and determine according to the performance indicator of each actuator of acquisition the present load of each actuator;And
The scheduler module is additionally configured to according to the present load of actuator from carrying out ETL task down to high selection respective actuators
Distribution.
10. system according to claim 8, wherein the actuator is configured as:
In response to receiving new ETL task, which is stored in task buffer queue, and record the ETL's
Arrival time;
The execution time of the ETL task is estimated based on the data volume in the ETL task;
Be finished in response to current task, for pending each ETL task, according to the waiting time of the ETL task and
The execution time estimated determines the execution priority of the ETL task;
The highest ETL task of execution priority is selected from pending ETL task to execute.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910401322.XA CN110287245B (en) | 2019-05-15 | 2019-05-15 | Method and system for scheduling and executing distributed ETL (extract transform load) tasks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910401322.XA CN110287245B (en) | 2019-05-15 | 2019-05-15 | Method and system for scheduling and executing distributed ETL (extract transform load) tasks |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110287245A true CN110287245A (en) | 2019-09-27 |
CN110287245B CN110287245B (en) | 2021-03-19 |
Family
ID=68002128
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910401322.XA Active CN110287245B (en) | 2019-05-15 | 2019-05-15 | Method and system for scheduling and executing distributed ETL (extract transform load) tasks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110287245B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111063405A (en) * | 2019-12-19 | 2020-04-24 | 南京医睿科技有限公司 | Task scheduling method, device, equipment and storage medium |
CN111176810A (en) * | 2019-12-19 | 2020-05-19 | 胡友彬 | Meteorological hydrological data processing and scheduling system based on priority |
CN111176840A (en) * | 2019-12-20 | 2020-05-19 | 青岛海尔科技有限公司 | Distributed task allocation optimization method and device, storage medium and electronic device |
CN111198757A (en) * | 2020-01-06 | 2020-05-26 | 北京小米移动软件有限公司 | CPU kernel scheduling method, CPU kernel scheduling device and storage medium |
CN111399826A (en) * | 2020-03-19 | 2020-07-10 | 北京三维天地科技股份有限公司 | Online data exchange method and system for visual drag flow diagram ET L |
CN111552569A (en) * | 2020-04-28 | 2020-08-18 | 咪咕文化科技有限公司 | System resource scheduling method, device and storage medium |
CN111625414A (en) * | 2020-04-29 | 2020-09-04 | 江苏瑞中数据股份有限公司 | Method for realizing automatic scheduling monitoring system of data conversion integration software |
CN111897865A (en) * | 2020-08-13 | 2020-11-06 | 工银科技有限公司 | Dynamic adjustment method and device for ETL (extract transform load) working load |
CN112231314A (en) * | 2020-11-05 | 2021-01-15 | 深圳市丽湖软件有限公司 | Quality data evaluation method based on ETL |
CN112380024A (en) * | 2021-01-18 | 2021-02-19 | 天道金科股份有限公司 | Thread scheduling method based on distributed counting |
CN113806053A (en) * | 2021-09-24 | 2021-12-17 | 国家石油天然气管网集团有限公司华南分公司 | Task scheduling method and device and computer readable storage medium |
CN114780648A (en) * | 2022-04-19 | 2022-07-22 | 湖南长银五八消费金融股份有限公司 | Task scheduling method, device, computer equipment, storage medium and program product |
CN115145591A (en) * | 2022-08-31 | 2022-10-04 | 之江实验室 | Multi-center-based medical ETL task scheduling method, system and device |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324522A (en) * | 2013-06-20 | 2013-09-25 | 北京奇虎科技有限公司 | Method and device for scheduling tasks for capturing data from servers |
CN105593818A (en) * | 2014-10-03 | 2016-05-18 | 数据梅尔公司 | Apparatus and method for scheduling distributed workflow tasks |
CN106951315A (en) * | 2017-03-17 | 2017-07-14 | 北京搜狐新媒体信息技术有限公司 | A kind of data task dispatching method and system based on ETL |
CN107291544A (en) * | 2017-08-03 | 2017-10-24 | 山东浪潮云服务信息科技有限公司 | Method and device, the distributed task scheduling execution system of task scheduling |
CN107665144A (en) * | 2016-07-29 | 2018-02-06 | 北京京东尚科信息技术有限公司 | The balance dispatching center of distributed task scheduling, mthods, systems and devices |
CN107818407A (en) * | 2017-10-20 | 2018-03-20 | 平安科技(深圳)有限公司 | Method for allocating tasks, device, storage medium and computer equipment |
CN108255595A (en) * | 2018-01-16 | 2018-07-06 | 北京中关村科金技术有限公司 | A kind of dispatching method of data task, device, equipment and readable storage medium storing program for executing |
CN108345501A (en) * | 2017-01-24 | 2018-07-31 | 全球能源互联网研究院 | A kind of distributed resource scheduling method and system |
US20180293098A1 (en) * | 2017-04-10 | 2018-10-11 | Bank Of America Corporation | Digital Processing System for Event and/or Time Based Triggering Management, and Control of Tasks |
US20180300174A1 (en) * | 2017-04-17 | 2018-10-18 | Microsoft Technology Licensing, Llc | Efficient queue management for cluster scheduling |
CN108762905A (en) * | 2018-05-24 | 2018-11-06 | 苏州乐麟无线信息科技有限公司 | A kind for the treatment of method and apparatus of multitask event |
CN109739893A (en) * | 2018-12-28 | 2019-05-10 | 上海连尚网络科技有限公司 | A kind of metadata management method, equipment and computer-readable medium |
-
2019
- 2019-05-15 CN CN201910401322.XA patent/CN110287245B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324522A (en) * | 2013-06-20 | 2013-09-25 | 北京奇虎科技有限公司 | Method and device for scheduling tasks for capturing data from servers |
CN105593818A (en) * | 2014-10-03 | 2016-05-18 | 数据梅尔公司 | Apparatus and method for scheduling distributed workflow tasks |
CN107665144A (en) * | 2016-07-29 | 2018-02-06 | 北京京东尚科信息技术有限公司 | The balance dispatching center of distributed task scheduling, mthods, systems and devices |
CN108345501A (en) * | 2017-01-24 | 2018-07-31 | 全球能源互联网研究院 | A kind of distributed resource scheduling method and system |
CN106951315A (en) * | 2017-03-17 | 2017-07-14 | 北京搜狐新媒体信息技术有限公司 | A kind of data task dispatching method and system based on ETL |
US20180293098A1 (en) * | 2017-04-10 | 2018-10-11 | Bank Of America Corporation | Digital Processing System for Event and/or Time Based Triggering Management, and Control of Tasks |
US20180300174A1 (en) * | 2017-04-17 | 2018-10-18 | Microsoft Technology Licensing, Llc | Efficient queue management for cluster scheduling |
CN107291544A (en) * | 2017-08-03 | 2017-10-24 | 山东浪潮云服务信息科技有限公司 | Method and device, the distributed task scheduling execution system of task scheduling |
CN107818407A (en) * | 2017-10-20 | 2018-03-20 | 平安科技(深圳)有限公司 | Method for allocating tasks, device, storage medium and computer equipment |
CN108255595A (en) * | 2018-01-16 | 2018-07-06 | 北京中关村科金技术有限公司 | A kind of dispatching method of data task, device, equipment and readable storage medium storing program for executing |
CN108762905A (en) * | 2018-05-24 | 2018-11-06 | 苏州乐麟无线信息科技有限公司 | A kind for the treatment of method and apparatus of multitask event |
CN109739893A (en) * | 2018-12-28 | 2019-05-10 | 上海连尚网络科技有限公司 | A kind of metadata management method, equipment and computer-readable medium |
Non-Patent Citations (7)
Title |
---|
JUN QING LI ET.AL: "Solving complex task scheduling by a hybrid genetic algorithm", 《PROCEEDING OF THE 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION》 * |
WEI YU ET.AL: "A Task Scheduling Mechanism Based on Quartz of Power Consumption Information Acquisition System", 《2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING》 * |
师金钢: "基于MapReduce架构的实时数据仓库关键技术研究", 《中国博士学位论文全文数据库信息科技辑》 * |
张晓磊: "云计算独立任务及关联任务调度算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
李磊: "ETL任务集群调度方法", 《计算机技术与发展》 * |
王荣丽: "基于云平台的测试任务调度策略的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
陈林 等编著: "《"互联网+"智慧校园技术与工程实施》", 30 September 2017, 成都:电子科技大学出版社 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111176810B (en) * | 2019-12-19 | 2023-04-07 | 胡友彬 | Meteorological hydrology data processing scheduling system based on priority |
CN111176810A (en) * | 2019-12-19 | 2020-05-19 | 胡友彬 | Meteorological hydrological data processing and scheduling system based on priority |
CN111063405A (en) * | 2019-12-19 | 2020-04-24 | 南京医睿科技有限公司 | Task scheduling method, device, equipment and storage medium |
CN111176840A (en) * | 2019-12-20 | 2020-05-19 | 青岛海尔科技有限公司 | Distributed task allocation optimization method and device, storage medium and electronic device |
CN111176840B (en) * | 2019-12-20 | 2023-11-28 | 青岛海尔科技有限公司 | Distribution optimization method and device for distributed tasks, storage medium and electronic device |
CN111198757A (en) * | 2020-01-06 | 2020-05-26 | 北京小米移动软件有限公司 | CPU kernel scheduling method, CPU kernel scheduling device and storage medium |
CN111198757B (en) * | 2020-01-06 | 2023-11-28 | 北京小米移动软件有限公司 | CPU kernel scheduling method, CPU kernel scheduling device and storage medium |
CN111399826A (en) * | 2020-03-19 | 2020-07-10 | 北京三维天地科技股份有限公司 | Online data exchange method and system for visual drag flow diagram ET L |
CN111399826B (en) * | 2020-03-19 | 2020-12-01 | 北京三维天地科技股份有限公司 | Visual dragging flow diagram ETL online data exchange method and system |
CN111552569A (en) * | 2020-04-28 | 2020-08-18 | 咪咕文化科技有限公司 | System resource scheduling method, device and storage medium |
CN111552569B (en) * | 2020-04-28 | 2023-10-20 | 咪咕文化科技有限公司 | System resource scheduling method, device and storage medium |
CN111625414A (en) * | 2020-04-29 | 2020-09-04 | 江苏瑞中数据股份有限公司 | Method for realizing automatic scheduling monitoring system of data conversion integration software |
CN111897865A (en) * | 2020-08-13 | 2020-11-06 | 工银科技有限公司 | Dynamic adjustment method and device for ETL (extract transform load) working load |
CN112231314A (en) * | 2020-11-05 | 2021-01-15 | 深圳市丽湖软件有限公司 | Quality data evaluation method based on ETL |
CN112380024A (en) * | 2021-01-18 | 2021-02-19 | 天道金科股份有限公司 | Thread scheduling method based on distributed counting |
CN112380024B (en) * | 2021-01-18 | 2021-05-25 | 天道金科股份有限公司 | Thread scheduling method based on distributed counting |
CN113806053A (en) * | 2021-09-24 | 2021-12-17 | 国家石油天然气管网集团有限公司华南分公司 | Task scheduling method and device and computer readable storage medium |
CN114780648A (en) * | 2022-04-19 | 2022-07-22 | 湖南长银五八消费金融股份有限公司 | Task scheduling method, device, computer equipment, storage medium and program product |
CN115145591A (en) * | 2022-08-31 | 2022-10-04 | 之江实验室 | Multi-center-based medical ETL task scheduling method, system and device |
US12119108B2 (en) | 2022-08-31 | 2024-10-15 | Zhejiang Lab | Medical ETL task dispatching method, system and apparatus based on multiple centers |
Also Published As
Publication number | Publication date |
---|---|
CN110287245B (en) | 2021-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110287245A (en) | Method and system for scheduling and executing distributed ETL (extract transform load) tasks | |
WO2021208546A1 (en) | Multi-dimensional resource scheduling method in kubernetes cluster architecture system | |
US10664308B2 (en) | Job distribution within a grid environment using mega-host groupings of execution hosts | |
US20200104377A1 (en) | Rules Based Scheduling and Migration of Databases Using Complexity and Weight | |
US20180198855A1 (en) | Method and apparatus for scheduling calculation tasks among clusters | |
CN104102543B (en) | The method and apparatus of adjustment of load in a kind of cloud computing environment | |
CN104298550B (en) | A kind of dynamic dispatching method towards Hadoop | |
WO2016082370A1 (en) | Distributed node intra-group task scheduling method and system | |
US20070247659A1 (en) | Print job management system | |
CN107291545A (en) | The method for scheduling task and equipment of multi-user in computing cluster | |
US8984521B2 (en) | Computer system performance by applying rate limits to control block tenancy | |
CN110297699A (en) | Dispatching method, scheduler, storage medium and system | |
US20100083263A1 (en) | Resource information collecting device, resource information collecting method, program, and collection schedule generating device | |
Castillo et al. | On the design of online scheduling algorithms for advance reservations and QoS in grids | |
CN110347602B (en) | Method and device for executing multitasking script, electronic equipment and readable storage medium | |
CN115220916B (en) | Automatic calculation scheduling method, device and system of video intelligent analysis platform | |
CN114911613A (en) | Cross-cluster resource high-availability scheduling method and system in inter-cloud computing environment | |
CN104346220B (en) | A kind of method for scheduling task and system | |
CN105740077B (en) | Task allocation method suitable for cloud computing | |
CN106708624B (en) | Self-adaptive adjustment method for multi-working-domain computing resources | |
Kim et al. | Virtual machines placement for network isolation in clouds | |
Cheng et al. | Improving fair scheduling performance on hadoop | |
CN113590317A (en) | Scheduling method, device, medium and computing equipment of offline service | |
CN117971505B (en) | Method and device for deploying container application | |
CN112242919A (en) | Fault file processing method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |