CN112506957A - Method and device for determining workflow dependency relationship - Google Patents
Method and device for determining workflow dependency relationship Download PDFInfo
- Publication number
- CN112506957A CN112506957A CN202011511346.XA CN202011511346A CN112506957A CN 112506957 A CN112506957 A CN 112506957A CN 202011511346 A CN202011511346 A CN 202011511346A CN 112506957 A CN112506957 A CN 112506957A
- Authority
- CN
- China
- Prior art keywords
- workflow
- relation
- layer
- relation table
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000011144 upstream manufacturing Methods 0.000 claims abstract description 34
- 230000001419 dependent effect Effects 0.000 claims description 14
- 238000004140 cleaning Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 13
- 230000000903 blocking effect Effects 0.000 claims description 7
- 238000012544 monitoring process Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000007726 management method Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0633—Workflow analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Game Theory and Decision Science (AREA)
- Computational Linguistics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application provides a method and a device for determining workflow dependency relationship, wherein the method can comprise the following steps: establishing a first relation table, wherein the first relation table is used for respectively recording the association relation between each workflow and an input table and an output table thereof; inquiring the first relation table aiming at the target workflow to be processed; and when the output table of at least one workflow queried is the same as the input table of the target workflow, determining the queried workflow as the upstream workflow depended by the target workflow. Through the technical scheme, the workflow dependency relationship can be automatically determined by the data warehouse, the workload of manual configuration is reduced, and the problem of dependency relationship error caused by manual error is avoided.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for determining workflow dependency.
Background
With the development of science and technology, data warehouses are gradually and widely applied to various fields. The construction of the data warehouse relates to the circulation of data among all layers of the data warehouse, and the workflow is configured in the data warehouse, so that the data in the data warehouse can be better scheduled, and the function of the data warehouse is realized.
In the related art, in order to implement scheduling of data processing tasks, workflows are required to be configured in a data warehouse, outputs of some workflows may be used as inputs of other workflows, and in this case, the workflow of the latter has to wait for the completion of execution of the former workflow to start running, so that a user is required to manually identify a large number of workflows and manually set the mutual dependency relationship of the workflows, and the setting efficiency and accuracy are low.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for determining workflow dependency relationship, so as to enable a data warehouse to automatically determine the dependency relationship of the workflow, and reduce workload of manually configuring the workflow.
Specifically, the method is realized through the following technical scheme:
according to a first aspect of the present application, a method for determining workflow dependencies is provided, which is applied to a data warehouse and includes:
establishing a first relation table, wherein the first relation table is used for respectively recording the association relation between each workflow and an input table and an output table thereof;
inquiring the first relation table aiming at the target workflow to be processed;
and when the output table of at least one workflow queried is the same as the input table of the target workflow, determining the queried workflow as the upstream workflow depended by the target workflow.
According to a second aspect of the present application, there is provided an apparatus for determining workflow dependency relationship, applied to a data warehouse, including:
the establishing unit: establishing a first relation table, wherein the first relation table is used for respectively recording the association relation between each workflow and an input table and an output table thereof;
a query unit: inquiring the first relation table aiming at the target workflow to be processed;
a determination unit: and when the output table of at least one workflow queried is the same as the input table of the target workflow, determining the queried workflow as the upstream workflow depended by the target workflow.
According to a third aspect of the present application, there is provided an electronic device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method as described in the embodiments of the first aspect above by executing the executable instructions.
According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method as described in the embodiments of the first aspect above.
According to the technical scheme, the dependency relationship between the workflow and the input table and the dependency relationship between the workflow and the output table are established, and the workflow is inquired and analyzed, so that the upstream and downstream dependency relationships of the workflow in the data warehouse can be automatically determined, the workload of manual configuration is reduced, and the problem of dependency relationship errors caused by manual errors is solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 is a flow chart illustrating a method for workflow dependency determination according to an exemplary embodiment of the present application;
FIG. 2A is a schematic diagram of a network architecture of a data warehouse system to which embodiments of the present application are applied;
FIG. 2B is a schematic diagram of a data warehouse architecture to which embodiments of the present application are applied;
FIG. 3 is a flow chart illustrating a method for workflow dependency determination according to an exemplary embodiment of the present application;
FIG. 4 is a schematic diagram illustrating a workflow dependency determination electronic device according to an exemplary embodiment of the present application;
fig. 5 is a block diagram illustrating a workflow dependency determination apparatus according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
Next, examples of the present application will be described in detail.
Fig. 1 is a flowchart illustrating a method for determining workflow dependencies according to an exemplary embodiment of the present application. As shown in fig. 1, the method applied to a data warehouse may include the following steps:
step 102: and establishing a first relation table, wherein the first relation table is used for respectively recording the association relation between each workflow and the input table and the output table thereof.
The data warehouse is a theme-oriented, integrated, relatively stable data set reflecting historical changes, is used for supporting management decisions, and can centralize, integrate and process data of all large business systems to form a global unified data view.
The data warehouse has a hierarchical structure, each hierarchical structure has respective function, data are converted from dispersed to centralized, from fine granularity to high summary, from a business model to an analytical model, and background data support can be better provided for the analytical system. By way of example, the data warehouse may include: a service layer, an ODS (Operation Data Store) layer, an STD (Standard Data Standard) layer, and a DWD (Data ware Detail) layer.
The nodes are configured between each adjacent layer and used for processing the data in the upper layer database and transmitting the processed data to the lower layer database. For example, in the present application, the data warehouse may be configured with an ETL (extract Transformation Loading) node, a cleaning node, and a model processing node. The ETL node is positioned between the service layer and the ODS layer and used for cleaning the service layer data and storing the service layer data into the ODS layer; the cleaning node is positioned between the ODS layer and the STD layer and used for cleaning the ODS data and storing the cleaned ODS data into the STD layer; and the model processing node is positioned between the STD layer and the DWD layer and is used for integrating the STD layer data and storing the data into the DWD layer.
The workflow refers to a series of activities with fixed work sequence, and documents, information or tasks are transmitted and executed among different executives according to set process rules. In this application, a workflow is a call process for data processing work between adjacent tiers in a data warehouse. For example, the work process of transferring data from the service layer to the STD layer through the ODS layer, gathering and cleaning the data forms a workflow.
In an embodiment, the first association table may be manually configured to manually record the input table and the output table corresponding to each workflow.
In another embodiment, the first association table may be automatically generated by a data warehouse, and the data warehouse acquires the second and third pre-configured relationship tables, respectively; the second relation table is used for respectively recording the incidence relation between each workflow and the node contained in the workflow, and the third relation table is used for respectively recording the incidence relation between each node and the input table and the output table of the workflow; and generating a first relation table according to the second relation table and the third relation table.
Wherein generating the first relationship table according to the second relationship table and the third relationship table comprises: determining input tables of all nodes contained in any workflow as an input table of the workflow, and determining output tables of all nodes contained in the workflow as an output table of the workflow; and recording the association relation between each workflow and the input table and the output table thereof to generate the first relation table. For example, if the workflow a only calls the node a and the node b, and calls the node a first and then calls the node b, the input table of the workflow a is the input table of the node a and the node b, and the output table of the workflow a is the output table of the node a and the output table of the node b. When the output of the node a is the input of the node b, the input of the node b can be omitted from the input of the workflow a. Through the incidence relation between the nodes between the adjacent layers and the workflows, the input table and the output table of each workflow are convenient to obtain, and therefore the dependency relation between the workflows can be conveniently inferred through the input and the output of the workflows.
Step 104: inquiring the first relation table aiming at the target workflow to be processed;
and inquiring whether an upstream workflow exists in a first relation table, wherein an output table of the upstream workflow is the same as an input table of the target workflow. In the workflow with the dependency relationship, the output result of the upstream workflow can be used as the input of the target workflow, and the input and output of each workflow can be clearly and quickly known by inquiring the input table and the output table of the workflow.
Step 106: and when the output table of at least one workflow queried is the same as the input table of the target workflow, determining the queried workflow as the dependent upstream workflow of the target workflow.
The output result of the upstream workflow is used as the input of the target workflow, for example, the output table of the upstream workflow is the input table of the target workflow. After the corresponding relationship between each workflow and its upstream workflow is determined, a workflow dependency relationship table may be generated and recorded. By inquiring the first relation table, the data warehouse can automatically determine the dependency relationship of the workflow, the workload of manual configuration is reduced, and meanwhile the problem of dependency relationship errors caused by manual errors is avoided.
In one embodiment, the data warehouse may monitor the second relational table and the third relational table at regular times; and updating the first relation table when the second relation table and the third relation table are changed. The data warehouse can acquire a second relation table and a third relation table according to a preset monitoring period, judge whether the second relation table and the third relation table are the same as the currently used second relation table and third relation table, and if so, do not change the current workflow dependency relationship; and if not, generating a new first relation table according to the obtained new second relation table and the new third relation table, and re-determining the dependency relationship of the workflow according to the new first relation table. The working principle of generating the first relationship table according to the second relationship table and the third relationship table and the working principle of determining the workflow dependency relationship according to the first relationship table may refer to the related description of step 102 in the embodiment shown in fig. 1, and details are not repeated here. By monitoring whether the second relation table or the third relation table changes or not at regular time, the corresponding adjustment of the workflow dependency relationship can be avoided from being omitted.
In another embodiment, after determining the workflow dependency relationship, the data warehouse may trace the dependency relationship of each workflow to form a dependency link, where a subsequent workflow in any adjacent workflow in the dependency link depends on a previous workflow. For example, if the workflow upstream of workflow c is workflow b and the workflow upstream of workflow b is workflow a, then tracing back workflow c may result in dependency link a → b → c. Correspondingly, when repeated workflows exist in the dependency link, the workflow dependency relationship with errors in the dependency link can be judged, and an error report message is output to a preset object; the preset object may be a user. For example, the workflow a is traced back, if the traced back dependency link is a → b → c → a, then the workflow a is repeated, it can be determined that an incorrect dependency relationship exists in the dependency link a → b → c → a, and an error report message of the workflow a workflow b workflow c is output to the user, so that the user can conveniently and manually confirm the input and output relationship in the processing nodes included in each workflow on the dependency link, and can manually adjust the task with an unreasonable design in time. By detecting whether repeated workflows exist in the dependent links of all workflows, the problem of cyclic dependence caused by unreasonable task design can be warned in time.
In another embodiment, a target workflow evaluates whether an upstream workflow upon which the target work depends is scheduled for completion; when the upstream workflow scheduling relied on by the target workflow is not completed, the target workflow is configured to be in a blocking waiting state; and when the upstream workflow schedule depended by the target workflow is finished, switching the target workflow into the blocking waiting state to a starting state. By evaluating the scheduling state of the upstream workflow in real time, the workflow with the upstream dependence can accurately judge the starting time, and the execution is started after the operation of the dependent workflow is finished, so that the phenomenon that the work of the upstream workflow is not finished when the downstream workflow starts to work does not occur; or when the upstream workflow is finished, the downstream work is delayed without starting the work, so that the resource waste is reduced.
According to the technical scheme, the incidence relation between the workflow and the input table and the output table thereof is established, the upstream and downstream dependency relation of the workflow in the data warehouse can be automatically determined through query and analysis, the workload of manual configuration is reduced, and the problem of dependency relation errors caused by manual errors is avoided.
Fig. 2A is a schematic network architecture diagram of a workflow dependency determination system according to an embodiment of the present application. As shown in fig. 2A, the data warehouse system may include a management device 21 and a data warehouse 22, where the management device 21 manages the data warehouse 22 and configures the dependency relationship of the workflow in the data warehouse. The technical scheme of the application is applied to the management device 21. It should be understood that in some cases, the management device 21 may also be directly contained within the data warehouse 22, for example, may be deployed on a certain node within the data warehouse 22. The data warehouse 22 may include multiple levels, as shown in fig. 2B, and in order to apply the data warehouse architecture diagram of the embodiment of the present application, the data warehouse 22 may include a service level, an ODS level, an STD level, and a DWD level, and each level may include multiple sets of data tables, such as the service level table a, the service level table B, ODS, the layer A, ODS, the layer B, STD, the layer A, STD, the DWD level model a, and the like shown in fig. 2B, although the present application is not limited to the number and names of the levels included in the data warehouse 22 and the data tables included in each level.
In the technical solution of the present application, the determination method of the workflow dependency relationship in the data warehouse 22 can be improved by analyzing the relationship among the workflow, the node, and the input/output table, which is described in detail below with reference to fig. 3. Fig. 3 is a detailed flowchart illustrating a workflow dependency relationship determination method according to an exemplary embodiment of the present application. As shown in FIG. 3, the process by which the data warehouse 22 automatically determines workflow dependencies includes the following steps:
step 302: and respectively acquiring a second relation table and a third relation table which are pre-configured.
The second relation table is used for recording the association relation between each workflow and the node contained in the workflow, and the third relation table is used for recording the association relation between each node and the input table and the output table of the workflow.
For example, as shown in fig. 2B, the data warehouse 22 is configured with a workflow WF _ a, a workflow WF _ B, a workflow WF _ M _ a, an ETL node a, a cleaning node a, a model processing node A, ETL, a cleaning node B, and a model processing node B.
As shown in table 1, a second relation table in the present embodiment:
work byStream name | Including node lists |
WF_A | ETL node A, cleaning node A |
WF_B | ETL node B, cleaning node B |
WF_M_A | Model processing node A |
TABLE 1
As shown in table 2, a third relation table in the present embodiment:
TABLE 2
Step 304: and establishing a first relation table, wherein the first relation table is used for respectively recording the association relation between each workflow and the input table and the output table thereof.
According to the second relation table and the third relation table, determining the input tables of all nodes contained in any workflow as the input tables of any workflow, and determining the output tables of all nodes contained in any workflow as the output tables of any workflow; and recording the incidence relation between each workflow and the input table and the output table thereof, and generating the first relation table.
As shown in table 3, the first relationship table generated according to the second relationship table and the third relationship table in this embodiment is:
TABLE 3
Step 306: and inquiring the first relation table aiming at the target workflow to be processed.
And inquiring whether the output table of the workflow in the first relation table is the same as the input table of the target workflow.
For example, in this embodiment, the target workflow is WF _ M _ a, the input tables std _ a _ t and std _ b _ t of the target workflow WF _ M _ a are obtained by looking up the table 3, and then, whether a workflow with an output table std _ a _ t or std _ b _ t exists in the table 3 is continuously looked up.
Step 308: and when the output table of at least one workflow queried is the same as the input table of the target workflow, determining the queried workflow as the dependent upstream workflow of the target workflow.
For example, in this embodiment, it may be found that the workflows whose output tables are std _ a _ t or std _ B _ t in table 3 are WF _ a and WF _ B, respectively, at this time, it may be determined that the workflows upstream of the target workflow WF _ M _ a are WF _ a and WF _ B, as shown in table 4, the workflows dependency relationship table determined in this embodiment is used.
TABLE 4
In this embodiment, the data warehouse needs to evaluate whether the workflow WF _ a and the workflow WF _ B are scheduled in real time; when the scheduling of the workflow WF _ A and the workflow WF _ B is not finished, the workflow WF _ M _ A is configured to be in a blocking waiting state; and when the scheduling of the workflow WF _ A and the workflow WF _ B is finished, switching the workflow WF _ M _ A to a starting state.
Step 310: and tracing the dependency relationship of each workflow to form a dependent link, wherein the subsequent workflow in any adjacent workflow in the dependent link depends on the previous workflow.
For example, as shown in table 5, a workflow dependency table is shown:
workflow name | Upstream workflow |
WF_A | WF_C |
WF_B | WF_A |
WF_C | WF_B |
TABLE 5
Tracing the dependency relationship of each workflow, wherein the corresponding dependency links are as follows:
WF_A->WF_C->WF_B->WF_A
WF_B->WF_A->WF_C->WF_B
WF_C->WF_B->WF_A->WF_C
step 312: when the dependency relationship has repeated workflow, judging that the dependency relationship of the workflow with errors exists in the dependency link, and outputting an error report message to a preset object; and when the dependency relationship does not have repeated workflow, judging that no wrong workflow dependency relationship exists in the dependency link.
In this embodiment, the workflow WF _ a in the first dependent link is repeated, the workflow WF _ B in the second dependent link is repeated, and the workflow WF _ C in the third dependent link is repeated, at this time, it may be determined that an erroneous workflow dependency exists in the three dependent links, and an error report message of the workflow WF _ a, the workflow WF _ B, and the workflow WF _ C is sent to the user.
After receiving the error report message, the user needs to manually analyze each node in the workflow WF _ a, the workflow WF _ B, and the workflow WF _ C, and update the configuration of the second workflow and/or the third workflow.
Corresponding to the method embodiments, the present specification also provides an embodiment of an apparatus.
FIG. 4 is a schematic diagram of an electronic device for workflow dependency determination shown in accordance with an exemplary embodiment of the present application. Referring to fig. 4, at the hardware level, the electronic device includes a processor 402, an internal bus 404, a network interface 406, a memory 408, and a non-volatile memory 410, but may also include hardware required for other services. The processor 402 reads the corresponding computer program from the non-volatile memory 410 into the memory 408 and then runs the computer program, thereby forming a device for solving the dual-host hot-standby dual-master problem on a logic level. Of course, besides the software implementation, the present application does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.
Fig. 5 is a block diagram illustrating an apparatus for determining workflow dependencies according to an exemplary embodiment of the present application. Referring to fig. 5, the apparatus includes a creating unit 502, a querying unit 504, and a determining unit 506, where:
the establishing unit 502 is configured to establish a first relation table, where the first relation table is used to record association relations between each workflow and its input table and output table, respectively.
The querying unit 504 is configured to query the first relation table for the target workflow to be processed.
The determining unit 506 is configured to determine the queried workflow as an upstream workflow on which the target workflow depends when the output table of the queried at least one workflow is the same as the input table of the target workflow.
Optionally, the establishing the first relation table includes: respectively acquiring a second relation table and a third relation table which are pre-configured; the second relation table is used for respectively recording the association relation between each workflow and the node contained in the workflow, and the third relation table is used for respectively recording the association relation between each node and the input table and the output table of the workflow; and generating the first relation according to the second relation and the third relation.
Optionally, the generating the first relationship according to the second relationship and the third relationship includes: determining input tables of all nodes contained in any workflow as an input table of the workflow, and determining output tables of all nodes contained in the workflow as an output table of the workflow; and recording the association relation between each workflow and the input table and the output table thereof to generate the first relation table.
The optional data warehouse specifically includes: the node specifically comprises a service layer, an ODS layer, an STD layer and a DWD layer, and the node specifically comprises: the ETL node is positioned between the service layer and the ODS layer and used for collecting the service layer data and storing the service layer data into the ODS layer; the cleaning node is positioned between the ODS layer and the STD layer and used for cleaning the ODS layer data and storing the ODS layer data into the STD layer; and the model processing node is positioned between the STD layer and the DWD layer and is used for integrating and storing the STD layer data into the DWD layer.
Optionally, the apparatus further comprises:
and a monitoring unit 508, configured to monitor the second relation table and the third relation table at regular time.
An updating unit 510, configured to update the first relationship table when the second relationship table or the third relationship table changes.
Optionally, the apparatus further comprises:
a tracing unit 512, configured to trace the dependency relationship of each workflow to form a dependent link, where a subsequent workflow in any adjacent workflow in the dependent link depends on a previous workflow.
A determining unit 514, configured to determine that an erroneous workflow dependency exists in the dependency link when there is a duplicate workflow in the dependency link, and output an error message to a preset object.
Optionally, the apparatus further comprises:
an evaluation unit 516, configured to evaluate whether an upstream workflow on which the target workflow depends is scheduled to complete.
A switching unit 518, configured to configure the target workflow into a blocking waiting state when the upstream workflow schedule on which the target workflow depends is not completed; and when the upstream workflow schedule depended by the target workflow is finished, switching the target workflow into the blocking waiting state to a starting state.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium, for example a memory, comprising instructions executable by a processor of a workflow dependency determination apparatus to implement a method as described in any of the above embodiments, such as the method may comprise:
establishing a first relation table, wherein the first relation table is used for respectively recording the association relation between each workflow and an input table and an output table thereof; inquiring the first relation table aiming at the target workflow to be processed; and when the output table of at least one workflow queried is the same as the input table of the target workflow, determining the queried workflow as the upstream workflow depended by the target workflow.
The non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, etc., which is not limited in this application.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.
Claims (10)
1. A method for determining workflow dependencies, applied to a data warehouse, the method comprising:
establishing a first relation table, wherein the first relation table is used for respectively recording the association relation between each workflow and an input table and an output table thereof;
inquiring the first relation table aiming at the target workflow to be processed;
and when the output table of at least one workflow queried is the same as the input table of the target workflow, determining the queried workflow as the upstream workflow depended by the target workflow.
2. The method of claim 1, wherein establishing the first relational table comprises:
respectively acquiring a second relation table and a third relation table which are pre-configured; the second relation table is used for respectively recording the association relation between each workflow and the node contained in the workflow, and the third relation table is used for respectively recording the association relation between each node and the input table and the output table of the workflow;
and generating the first relation table according to the second relation table and the third relation table.
3. The method of claim 2, wherein generating the first relational table according to the second relational table and the third relational table comprises:
determining input tables of all nodes contained in any workflow as an input table of the workflow, and determining output tables of all nodes contained in the workflow as an output table of the workflow;
and recording the association relation between each workflow and the input table and the output table thereof to generate the first relation table.
4. The method of claim 2, wherein the data warehouse comprises: a service layer, an ODS layer, an STD layer and a DWD layer; the node comprises:
the ETL node is positioned between the service layer and the ODS layer and used for collecting the service layer data and storing the service layer data into the ODS layer;
the cleaning node is positioned between the ODS layer and the STD layer and used for cleaning the ODS layer data and storing the ODS layer data into the STD layer;
and the model processing node is positioned between the STD layer and the DWD layer and is used for integrating and storing the STD layer data into the DWD layer.
5. The method of claim 2, further comprising:
monitoring the second relation table and the third relation table at regular time;
and updating the first relation table when the second relation table or the third relation table is changed.
6. The method of claim 1, further comprising:
tracing the dependency relationship of each workflow to form a dependency link, wherein a subsequent workflow in any adjacent workflow in the dependency link depends on a previous workflow;
and when repeated workflow exists in the dependent link, judging that the dependent link has wrong workflow dependency relationship, and outputting an error report message to a preset object.
7. The method of claim 1, further comprising:
evaluating whether an upstream workflow on which the target workflow depends is scheduled to be completed;
when the upstream workflow scheduling relied on by the target workflow is not completed, the target workflow is configured to be in a blocking waiting state;
and when the upstream workflow schedule depended by the target workflow is finished, switching the target workflow into the blocking waiting state to a starting state.
8. An apparatus for determining workflow dependencies, applied to a data warehouse, the apparatus comprising:
the system comprises an establishing unit, a calculating unit and a calculating unit, wherein the establishing unit is used for establishing a first relation table which is used for respectively recording the incidence relation between each workflow and an input table and an output table thereof;
the query unit is used for querying the first relation table aiming at the target workflow to be processed;
and the determining unit is used for determining the inquired workflow as the upstream workflow depended by the target workflow when the output table inquired to at least one workflow is the same as the input table of the target workflow.
9. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of any one of claims 1-7 by executing the executable instructions.
10. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, perform the steps of the method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011511346.XA CN112506957A (en) | 2020-12-18 | 2020-12-18 | Method and device for determining workflow dependency relationship |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011511346.XA CN112506957A (en) | 2020-12-18 | 2020-12-18 | Method and device for determining workflow dependency relationship |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112506957A true CN112506957A (en) | 2021-03-16 |
Family
ID=74922793
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011511346.XA Pending CN112506957A (en) | 2020-12-18 | 2020-12-18 | Method and device for determining workflow dependency relationship |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112506957A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113157789A (en) * | 2021-04-16 | 2021-07-23 | 北京思特奇信息技术股份有限公司 | Method for reversely reasoning ETL scheduling task dependency relationship based on SQL script |
CN113672674A (en) * | 2021-07-15 | 2021-11-19 | 浙江大华技术股份有限公司 | Method, electronic device and storage medium for automatically arranging service flow |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105589874A (en) * | 2014-10-22 | 2016-05-18 | 阿里巴巴集团控股有限公司 | ETL task dependence relationship detecting method and device and ETL tool |
CN105808619A (en) * | 2014-12-31 | 2016-07-27 | 华为技术有限公司 | Task redoing method based on influence analysis, influence analysis calculation device and one-key reset device |
CN107783879A (en) * | 2016-08-29 | 2018-03-09 | 阿里巴巴集团控股有限公司 | A kind of method and apparatus for being used to analyze workflow execution path |
CN107870949A (en) * | 2016-09-28 | 2018-04-03 | 腾讯科技(深圳)有限公司 | Data analysis job dependence relation generation method and system |
CN109978482A (en) * | 2017-12-27 | 2019-07-05 | 华为技术有限公司 | Workflow processing method, device, equipment and storage medium |
CN110019207A (en) * | 2017-11-02 | 2019-07-16 | 阿里巴巴集团控股有限公司 | Data processing method and device and script display methods and device |
CN110609740A (en) * | 2019-09-19 | 2019-12-24 | 深圳前海微众银行股份有限公司 | Method and device for determining dependency relationship between tasks |
CN110689245A (en) * | 2019-09-17 | 2020-01-14 | 上海易点时空网络有限公司 | Method and system for analyzing call relation of big data workflow |
CN111026568A (en) * | 2019-12-04 | 2020-04-17 | 深圳前海环融联易信息科技服务有限公司 | Data and task relation construction method and device, computer equipment and storage medium |
US10673712B1 (en) * | 2014-03-27 | 2020-06-02 | Amazon Technologies, Inc. | Parallel asynchronous stack operations |
CN111666326A (en) * | 2020-05-29 | 2020-09-15 | 中国工商银行股份有限公司 | ETL scheduling method and device |
CN111797157A (en) * | 2020-07-21 | 2020-10-20 | 政采云有限公司 | Data processing method and system, electronic equipment and storage medium |
-
2020
- 2020-12-18 CN CN202011511346.XA patent/CN112506957A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10673712B1 (en) * | 2014-03-27 | 2020-06-02 | Amazon Technologies, Inc. | Parallel asynchronous stack operations |
CN105589874A (en) * | 2014-10-22 | 2016-05-18 | 阿里巴巴集团控股有限公司 | ETL task dependence relationship detecting method and device and ETL tool |
CN105808619A (en) * | 2014-12-31 | 2016-07-27 | 华为技术有限公司 | Task redoing method based on influence analysis, influence analysis calculation device and one-key reset device |
CN107783879A (en) * | 2016-08-29 | 2018-03-09 | 阿里巴巴集团控股有限公司 | A kind of method and apparatus for being used to analyze workflow execution path |
CN107870949A (en) * | 2016-09-28 | 2018-04-03 | 腾讯科技(深圳)有限公司 | Data analysis job dependence relation generation method and system |
CN110019207A (en) * | 2017-11-02 | 2019-07-16 | 阿里巴巴集团控股有限公司 | Data processing method and device and script display methods and device |
CN109978482A (en) * | 2017-12-27 | 2019-07-05 | 华为技术有限公司 | Workflow processing method, device, equipment and storage medium |
CN110689245A (en) * | 2019-09-17 | 2020-01-14 | 上海易点时空网络有限公司 | Method and system for analyzing call relation of big data workflow |
CN110609740A (en) * | 2019-09-19 | 2019-12-24 | 深圳前海微众银行股份有限公司 | Method and device for determining dependency relationship between tasks |
CN111026568A (en) * | 2019-12-04 | 2020-04-17 | 深圳前海环融联易信息科技服务有限公司 | Data and task relation construction method and device, computer equipment and storage medium |
CN111666326A (en) * | 2020-05-29 | 2020-09-15 | 中国工商银行股份有限公司 | ETL scheduling method and device |
CN111797157A (en) * | 2020-07-21 | 2020-10-20 | 政采云有限公司 | Data processing method and system, electronic equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
丿灬逐风: "数仓初步构建", 《HTTPS://BLOG.CSDN.NET/QQ_31405633/ARTICLE/DETAILS/96290568?》 * |
丿灬逐风: "数仓初步构建", 《HTTPS://BLOG.CSDN.NET/QQ_31405633/ARTICLE/DETAILS/96290568?》, 17 July 2019 (2019-07-17) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113157789A (en) * | 2021-04-16 | 2021-07-23 | 北京思特奇信息技术股份有限公司 | Method for reversely reasoning ETL scheduling task dependency relationship based on SQL script |
CN113672674A (en) * | 2021-07-15 | 2021-11-19 | 浙江大华技术股份有限公司 | Method, electronic device and storage medium for automatically arranging service flow |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109684053B (en) | Task scheduling method and system for big data | |
CN108711030A (en) | The end-to-end project management platform integrated with artificial intelligence | |
US11610173B2 (en) | Intelligent collaborative project management | |
CN110751358A (en) | Scheduling method for airport ground service personnel, electronic equipment and storage medium | |
US20150302328A1 (en) | Work Environment Recommendation Based on Worker Interaction Graph | |
CN110457371A (en) | Data managing method, device, storage medium and system | |
CN114416703A (en) | Method, device, equipment and medium for automatically monitoring data integrity | |
CN112506957A (en) | Method and device for determining workflow dependency relationship | |
CN115098600A (en) | Directed acyclic graph construction method and device for data warehouse and computer equipment | |
CN109298929B (en) | Timed task execution time recommending method, device, equipment and storage medium | |
Barbosa et al. | Hybrid modelling of MTO/ETO manufacturing environments for performance assessment | |
WO2016153500A1 (en) | Standardized custom surveys | |
Overbeck et al. | Development and analysis of digital twins of production systems | |
CN110489329A (en) | A kind of output method of test report, device and terminal device | |
CN109389328A (en) | A kind of card Product development process management method and system | |
CN107291767B (en) | Optimization processing method and device for task execution time | |
US20150294426A1 (en) | Case management using active entities in a social network | |
CN107229569A (en) | Method and system are performed towards the automatic test centralized dispatching for performing technology more | |
US20140372386A1 (en) | Detecting wasteful data collection | |
US20220229692A1 (en) | Method and device for data task scheduling, storage medium, and scheduling tool | |
CN110909072A (en) | Data table establishing method, device and equipment | |
CN114721945A (en) | Graph database-based distribution method and device, electronic equipment and storage medium | |
CN112306862A (en) | Front-end automatic test system and method, storage medium and computing equipment | |
CN113570333B (en) | Process design method suitable for integration | |
CN118708266B (en) | Pipeline arrangement method, device, equipment and medium based on dynamic dependency management |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210316 |