CN109408236A - A kind of task load equalization methods of ETL on cluster - Google Patents
A kind of task load equalization methods of ETL on cluster Download PDFInfo
- Publication number
- CN109408236A CN109408236A CN201811226888.5A CN201811226888A CN109408236A CN 109408236 A CN109408236 A CN 109408236A CN 201811226888 A CN201811226888 A CN 201811226888A CN 109408236 A CN109408236 A CN 109408236A
- Authority
- CN
- China
- Prior art keywords
- task
- resource
- node
- cluster
- allocated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multi Processors (AREA)
Abstract
A kind of task load equalization methods that the present invention relates to ETL on cluster, comprising the following steps: step S1: the forecast consumption resource of task to be allocated is calculated;Step S2: pass through the resource service condition of task schedule center monitoring node;Step S3: effective idling-resource of computing cluster node;Step S4: differentiating task type to be allocated, carries out step S5 if individual task, then carries out step S6 if task groups;Step S5: individual task is allocated by task schedule center;Step S6: task groups are allocated by task schedule center;Step S7: task schedule center continuously carries out step S4-S6, and the task in task queue is assigned to the enough nodes of effective idling-resource, completes the task distribution of cluster.The load balancing that task is distributed in ETL cluster may be implemented in the present invention.
Description
Technical field
The present invention relates to ETL technical fields, and in particular to a kind of task load equalization methods of ETL on cluster.
Background technique
With the development of informatization, each government bodies, enterprises and institutions establish numerous information systems, and with
The increase of information system also produces a large amount of repeated and redundants while respectively isolated system produces a large number of services data
Data;In big data era, how to realize the unified convergence of data, the shared exchange these dispersions, be the important angle that ETL plays the part of
Color.
ETL (Extraction-Transformation-Loading), i.e. data pick-up (Extract), conversion
(Transform), the process of (Load) is loaded, it realizes and comes out the data pick-up that source stores, and then passes through pure and fresh conversion
Later, it is loaded into target storage.ETL is commonly used between the data convergence of data warehouse (large data center), system or database
Between data exchange, data conversion treatment etc., be a significant data handling implement of big data era.
Data convergence, data exchange are a systemic engineerings, need to carry out tens of or even tens of thousands of database tables
The processing such as synchronous, convergence, it is corresponding, need to execute tens of or even tens of thousands of ETL tasks according to plan.In order to ensure considerable task
Reliability, the stability of execution usually execute ETL task with node cluster, to ensure that the resource-sharing of ETL operational process makes
With reliable, failure tolerant of, system etc..
In numerous ETL tasks, since grade, elapsed time do not differ for the consumption resource of task, that is, some tasks
It is more to consume CPU, some consumption memories are more, and it is more that some execute the time.If (at random, according to traditional cluster load balancing method
Poll, weighted polling, dynamic polling, most fast algorithm, more than resource space at most etc.), ETL task is assigned to node, when being distributed
Task when starting to execute, since task needs resource different, overabsorption task may be crossed in a node, caused between task
Resource contention stacking reaction eventually leads to slowing down, locking for task, or even causes node torpor.
Summary of the invention
In view of this, the task load equalization methods that the purpose of the present invention is to provide a kind of ETL on cluster, solve collection
In group, a problem of node is excessive or very few distribution task and the waste of resource free time, resource allocation be uneven, resource contention
The problems such as locked.
To achieve the above object, the present invention adopts the following technical scheme:
A kind of task load equalization methods of ETL on cluster, comprising the following steps:
Step S1: the forecast consumption resource of task to be allocated is calculated;
Step S2: pass through the resource service condition of task schedule center monitoring node;
Step S3: effective idling-resource of computing cluster node;
Step S4: differentiating task type to be allocated, carries out step S5 if individual task, then carries out step S6 if task groups;
Step S5: individual task is allocated by task schedule center;
Step S6: task groups are allocated by task schedule center;
Step S7: task schedule center continuously carries out step S4-S6, and the task in task queue is assigned to effective idling-resource
Enough nodes complete the task distribution of cluster.
Further, the step S1 specifically:
Step S11: in a life process of task execution to be allocated, the resource consumption parameter of the task is recorded, comprising: minimum
CPU, highest CPU, comprehensive average CPU, minimum memory, maximum memory, comprehensive average memory, minimum runing time, maximum operation
Time, comprehensive average operating time;
Step S12: being submitted to task schedule center for resource consumption parameter, obtains normality resource consumption ginseng when task execution
Number, i.e. the forecast consumption resource of task.
Further, the node resource includes free time CPU, free memory, has distributed task.
Further, the step S3 specifically: by the resource of node, subtract node resource, then subtract pending
The forecast consumption resource of task, effective idling-resource of remaining resource, that is, node.
Further, the step S5 specifically:
Step S51: the most node of effective idling-resource in all nodes in acquisition cluster;
Step S52: if the node efficient resource is greater than the forecast consumption resource of task to be allocated, task to be allocated is assigned to this
Node, and the task to be allocated is removed from task queue;If the node efficient resource is less than the forecast consumption resource of task,
Then think that the task is unable to run the task, task is retracted into insertion task queue, carries out step S7.
Further, the step S6 specifically:
Step S61: to the task in task groups by prediction resource size sequence, the big task of forecast consumption resource is preferentially divided
Match;
Step S62: all tasks of task groups are assigned in the node in cluster, if operation is not all enough for node efficient resource
Remaining task is retracted insertion task queue, jumps to step S7 by task.
Compared with the prior art, the invention has the following beneficial effects:
The present invention realizes the load balancing that task in ETL cluster is distributed, and solves that a node in cluster is excessive or very few distribution
The problems such as the problem of task and resource free time waste, resource allocation is uneven, resource contention is locked.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart;
Fig. 2 is ELT cluster topology schematic diagram in one embodiment of the invention;
Fig. 3 be in one embodiment of the invention ELT task load and status diagram.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
Referring to figure 2., the present invention provides a task schedule center, and the container node of several task executions (referred to as saves
Point), a cluster is formed by several nodes;One task schedule center can undertake the task schedule of multiple clusters.Task
Operation plan of the control centre according to ETL task generates task and task queue is added, and task queue is showed according to FIFO(is advanced)
It is executed in sequence transmission task to node.In cluster, after arbitrary node failure, dispatching again for task can be assigned to normal operation
Node on, guarantee the normal operation of task.
Based in above-mentioned task schedule and node, in conjunction with Fig. 1, the present invention provides a kind of task load of ETL on cluster
Equalization methods, comprising the following steps:
Step S1: the forecast consumption resource of task to be allocated is calculated;
Step S11: in a life process of task execution to be allocated, the resource consumption parameter of the task is recorded, comprising: minimum
CPU, highest CPU, comprehensive average CPU, minimum memory, maximum memory, comprehensive average memory, minimum runing time, maximum operation
Time, comprehensive average operating time;
Step S12: being submitted to task schedule center for resource consumption parameter, obtains normality resource consumption ginseng when task execution
Number, i.e. the forecast consumption resource of task.
Step S2: pass through the resource service condition of task schedule center monitoring node;
Step S3: effective idling-resource of computing cluster node;By the resource of node, subtract node resource, then subtract to
The forecast consumption resource of execution task, effective idling-resource of remaining resource, that is, node.
Step S4: differentiating task type to be allocated, carries out step S5 if individual task, then carries out step if task groups
S6;
Step S5: individual task is allocated by task schedule center;
Step S51: the most node of effective idling-resource in all nodes in acquisition cluster;
Step S52: if the node efficient resource is greater than the forecast consumption resource of task to be allocated, task to be allocated is assigned to this
Node, and the task to be allocated is removed from task queue;If the node efficient resource is less than the forecast consumption resource of task,
Then think that the task is unable to run the task, task is retracted into insertion task queue, carries out step S7.
Step S6: task groups are allocated by task schedule center;
Step S61: to the task in task groups by prediction resource size sequence, the big task of forecast consumption resource is preferentially divided
Match;
Step S62: all tasks of task groups are assigned in the node in cluster, if operation is not all enough for node efficient resource
Remaining task is retracted insertion task queue, jumps to step S7 by task.
Step S7: task schedule center continuously carries out step S4-S6, and the task in task queue is assigned to effective free time
The enough nodes of resource complete the task distribution of cluster.
In an embodiment of the present invention, node resource includes free time CPU, free memory, has distributed task.Due to CPU, interior
Depositing is the moment in variation, therefore using resource data using the average value for closing on a period of time (such as in 1 minute), real-time number
According to value.These data are sent task distributing center by node timing.
The foregoing is merely presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent with
Modification, is all covered by the present invention.
Claims (6)
1. a kind of task load equalization methods of ETL on cluster, which comprises the following steps:
Step S1: the forecast consumption resource of task to be allocated is calculated;
Step S2: pass through the resource service condition of task schedule center monitoring node;
Step S3: effective idling-resource of computing cluster node;
Step S4: differentiating task type to be allocated, carries out step S5 if individual task, then carries out step S6 if task groups;
Step S5: individual task is allocated by task schedule center;
Step S6: task groups are allocated by task schedule center;
Step S7: task schedule center continuously carries out step S4-S6, and the task in task queue is assigned to effective idling-resource
Enough nodes complete the task distribution of cluster.
2. a kind of task load equalization methods of the ETL according to claim 1 on cluster, which is characterized in that the step
Rapid S1 specifically:
Step S11: in a life process of task execution to be allocated, the resource consumption parameter of the task is recorded, comprising: minimum
CPU, highest CPU, comprehensive average CPU, minimum memory, maximum memory, comprehensive average memory, minimum runing time, maximum operation
Time, comprehensive average operating time;
Step S12: being submitted to task schedule center for resource consumption parameter, obtains normality resource consumption ginseng when task execution
Number, i.e. the forecast consumption resource of task.
3. a kind of task load equalization methods of the ETL according to claim 1 on cluster, it is characterised in that: node money
Source includes free time CPU, free memory, has distributed task.
4. a kind of task load equalization methods of the ETL according to claim 1 on cluster, it is characterised in that: the step
Rapid S3 specifically: by the resource of node, node resource is subtracted, then subtracts the forecast consumption resource of pending task, it is remaining
The effective idling-resource of resource, that is, node.
5. a kind of task load equalization methods of the ETL according to claim 1 on cluster, it is characterised in that: the step
Rapid S5 specifically:
Step S51: the most node of effective idling-resource in all nodes in acquisition cluster;
Step S52: if the node efficient resource is greater than the forecast consumption resource of task to be allocated, task to be allocated is assigned to this
Node, and the task to be allocated is removed from task queue;If the node efficient resource is less than the forecast consumption resource of task,
Then think that the task is unable to run the task, task is retracted into insertion task queue, carries out step S7.
6. a kind of task load equalization methods of the ETL according to claim 1 on cluster, it is characterised in that: the step
Rapid S6 specifically:
Step S61: to the task in task groups by prediction resource size sequence, the big task of forecast consumption resource is preferentially divided
Match;
Step S62: all tasks of task groups are assigned in the node in cluster, if operation is not all enough for node efficient resource
Remaining task is retracted insertion task queue, jumps to step S7 by task.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811226888.5A CN109408236A (en) | 2018-10-22 | 2018-10-22 | A kind of task load equalization methods of ETL on cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811226888.5A CN109408236A (en) | 2018-10-22 | 2018-10-22 | A kind of task load equalization methods of ETL on cluster |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109408236A true CN109408236A (en) | 2019-03-01 |
Family
ID=65468685
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811226888.5A Pending CN109408236A (en) | 2018-10-22 | 2018-10-22 | A kind of task load equalization methods of ETL on cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109408236A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110430278A (en) * | 2019-08-14 | 2019-11-08 | 平安普惠企业管理有限公司 | Load balancing configuration method and device |
CN110708505A (en) * | 2019-09-18 | 2020-01-17 | 上海依图网络科技有限公司 | Video alarm method, device, electronic equipment and computer readable storage medium |
CN111144701A (en) * | 2019-12-04 | 2020-05-12 | 中国电子科技集团公司第三十研究所 | ETL job scheduling resource classification evaluation method under distributed environment |
CN111866043A (en) * | 2019-04-29 | 2020-10-30 | 中国移动通信集团河北有限公司 | Task processing method and device, computing equipment and computer storage medium |
CN112052093A (en) * | 2020-09-08 | 2020-12-08 | 哈尔滨工业大学 | Experimental big data resource allocation management system based on message queue technology |
WO2021057514A1 (en) * | 2019-09-24 | 2021-04-01 | 中兴通讯股份有限公司 | Task scheduling method and apparatus, computer device, and computer readable medium |
CN112596902A (en) * | 2020-12-25 | 2021-04-02 | 中科星通(廊坊)信息技术有限公司 | Task scheduling method and device based on CPU-GPU cooperative computing |
CN112732809A (en) * | 2020-12-31 | 2021-04-30 | 杭州海康威视系统技术有限公司 | ETL system and data processing method based on ETL system |
CN113687950A (en) * | 2021-08-31 | 2021-11-23 | 平安医疗健康管理股份有限公司 | Priority-based task allocation method, device, equipment and storage medium |
CN114356515A (en) * | 2021-12-15 | 2022-04-15 | 联奕科技股份有限公司 | Scheduling method of data conversion task |
WO2022160886A1 (en) * | 2021-01-29 | 2022-08-04 | Zhejiang Dahua Technology Co., Ltd. | Task allocation method, apparatus, storage medium, and electronic device |
CN115145591A (en) * | 2022-08-31 | 2022-10-04 | 之江实验室 | Multi-center-based medical ETL task scheduling method, system and device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101819540A (en) * | 2009-02-27 | 2010-09-01 | 国际商业机器公司 | Method and system for scheduling task in cluster |
CN102096602A (en) * | 2009-12-15 | 2011-06-15 | 中国移动通信集团公司 | Task scheduling method, and system and equipment thereof |
CN102622273A (en) * | 2012-02-23 | 2012-08-01 | 中国人民解放军国防科学技术大学 | Self-learning load prediction based cluster on-demand starting method |
CN103617086A (en) * | 2013-11-20 | 2014-03-05 | 东软集团股份有限公司 | Parallel computation method and system |
CN104050042A (en) * | 2014-05-30 | 2014-09-17 | 北京先进数通信息技术股份公司 | Resource allocation method and resource allocation device for ETL (Extraction-Transformation-Loading) jobs |
CN104516784A (en) * | 2014-07-11 | 2015-04-15 | 中国科学院计算技术研究所 | Method and system for forecasting task resource waiting time |
CN104636197A (en) * | 2015-01-29 | 2015-05-20 | 东北大学 | Evaluation method for data center virtual machine migration scheduling strategies |
CN107220122A (en) * | 2017-05-25 | 2017-09-29 | 深信服科技股份有限公司 | A kind of task recognition method and device based on cloud platform |
US20180060402A1 (en) * | 2016-08-29 | 2018-03-01 | International Business Machines Corporation | Managing software asset environment using cognitive distributed cloud infrastructure |
-
2018
- 2018-10-22 CN CN201811226888.5A patent/CN109408236A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101819540A (en) * | 2009-02-27 | 2010-09-01 | 国际商业机器公司 | Method and system for scheduling task in cluster |
CN102096602A (en) * | 2009-12-15 | 2011-06-15 | 中国移动通信集团公司 | Task scheduling method, and system and equipment thereof |
CN102622273A (en) * | 2012-02-23 | 2012-08-01 | 中国人民解放军国防科学技术大学 | Self-learning load prediction based cluster on-demand starting method |
CN103617086A (en) * | 2013-11-20 | 2014-03-05 | 东软集团股份有限公司 | Parallel computation method and system |
CN104050042A (en) * | 2014-05-30 | 2014-09-17 | 北京先进数通信息技术股份公司 | Resource allocation method and resource allocation device for ETL (Extraction-Transformation-Loading) jobs |
CN104516784A (en) * | 2014-07-11 | 2015-04-15 | 中国科学院计算技术研究所 | Method and system for forecasting task resource waiting time |
CN104636197A (en) * | 2015-01-29 | 2015-05-20 | 东北大学 | Evaluation method for data center virtual machine migration scheduling strategies |
US20180060402A1 (en) * | 2016-08-29 | 2018-03-01 | International Business Machines Corporation | Managing software asset environment using cognitive distributed cloud infrastructure |
CN107220122A (en) * | 2017-05-25 | 2017-09-29 | 深信服科技股份有限公司 | A kind of task recognition method and device based on cloud platform |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111866043A (en) * | 2019-04-29 | 2020-10-30 | 中国移动通信集团河北有限公司 | Task processing method and device, computing equipment and computer storage medium |
CN111866043B (en) * | 2019-04-29 | 2023-04-28 | 中国移动通信集团河北有限公司 | Task processing method, device, computing equipment and computer storage medium |
CN110430278A (en) * | 2019-08-14 | 2019-11-08 | 平安普惠企业管理有限公司 | Load balancing configuration method and device |
CN110708505A (en) * | 2019-09-18 | 2020-01-17 | 上海依图网络科技有限公司 | Video alarm method, device, electronic equipment and computer readable storage medium |
WO2021057514A1 (en) * | 2019-09-24 | 2021-04-01 | 中兴通讯股份有限公司 | Task scheduling method and apparatus, computer device, and computer readable medium |
CN112631764A (en) * | 2019-09-24 | 2021-04-09 | 中兴通讯股份有限公司 | Task scheduling method and device, computer equipment and computer readable medium |
CN111144701B (en) * | 2019-12-04 | 2022-03-22 | 中国电子科技集团公司第三十研究所 | ETL job scheduling resource classification evaluation method under distributed environment |
CN111144701A (en) * | 2019-12-04 | 2020-05-12 | 中国电子科技集团公司第三十研究所 | ETL job scheduling resource classification evaluation method under distributed environment |
CN112052093A (en) * | 2020-09-08 | 2020-12-08 | 哈尔滨工业大学 | Experimental big data resource allocation management system based on message queue technology |
CN112596902A (en) * | 2020-12-25 | 2021-04-02 | 中科星通(廊坊)信息技术有限公司 | Task scheduling method and device based on CPU-GPU cooperative computing |
CN112732809A (en) * | 2020-12-31 | 2021-04-30 | 杭州海康威视系统技术有限公司 | ETL system and data processing method based on ETL system |
CN112732809B (en) * | 2020-12-31 | 2023-08-04 | 杭州海康威视系统技术有限公司 | ETL system and data processing method based on ETL system |
WO2022160886A1 (en) * | 2021-01-29 | 2022-08-04 | Zhejiang Dahua Technology Co., Ltd. | Task allocation method, apparatus, storage medium, and electronic device |
CN113687950A (en) * | 2021-08-31 | 2021-11-23 | 平安医疗健康管理股份有限公司 | Priority-based task allocation method, device, equipment and storage medium |
CN114356515A (en) * | 2021-12-15 | 2022-04-15 | 联奕科技股份有限公司 | Scheduling method of data conversion task |
CN115145591A (en) * | 2022-08-31 | 2022-10-04 | 之江实验室 | Multi-center-based medical ETL task scheduling method, system and device |
US12119108B2 (en) | 2022-08-31 | 2024-10-15 | Zhejiang Lab | Medical ETL task dispatching method, system and apparatus based on multiple centers |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109408236A (en) | A kind of task load equalization methods of ETL on cluster | |
CN101938416B (en) | Cloud computing resource scheduling method based on dynamic reconfiguration virtual resources | |
TW201837733A (en) | Blockchain consensus method, equipment, and system in which a consensus algorithm is arranged as an independent consensus unit and multiple consensus units collectively form a consensus unit set | |
US20120159627A1 (en) | Suspicious node detection and recovery in mapreduce computing | |
CN107766147A (en) | Distributed data analysis task scheduling system | |
CN102299959A (en) | Load balance realizing method of database cluster system and device | |
CN107066319A (en) | A kind of multidimensional towards heterogeneous resource dispatches system | |
WO2012111905A3 (en) | Distributed memory cluster control device and method using mapreduce | |
US20120005345A1 (en) | Optimized resource management for map/reduce computing | |
CN108829512B (en) | Cloud center hardware accelerated computing power distribution method and system and cloud center | |
CN110383764A (en) | The system and method for usage history data processing event in serverless backup system | |
CN104679594B (en) | A kind of middleware distributed computing method | |
CN104735095A (en) | Method and device for job scheduling of cloud computing platform | |
CN108519917A (en) | A kind of resource pool distribution method and device | |
CN105791371B (en) | A kind of cloud storage service system and method | |
CN102339233A (en) | Cloud computing centralized management platform | |
CN106354574A (en) | Acceleration system and method used for big data K-Mean clustering algorithm | |
CN107678923A (en) | A kind of optimization method of distributed file system Message Processing | |
CN103067486A (en) | Big-data processing method based on platform-as-a-service (PaaS) platform | |
CN105184452A (en) | MapReduce operation dependence control method suitable for power information big-data calculation | |
CN104468710A (en) | Mixed big data processing system and method | |
CN107122235A (en) | Public infrastructure resource regulating method based on application priority | |
CN104156316B (en) | A kind of method and system of Hadoop clusters batch processing job | |
Ji et al. | Adaptive provisioning in-band network telemetry at computing power network | |
CN102609314A (en) | Quantification management method and quantification management system for virtual machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190301 |