CN114860820A - Optimization method and device for technical business of data warehouse and electronic equipment - Google Patents
Optimization method and device for technical business of data warehouse and electronic equipment Download PDFInfo
- Publication number
- CN114860820A CN114860820A CN202110077891.0A CN202110077891A CN114860820A CN 114860820 A CN114860820 A CN 114860820A CN 202110077891 A CN202110077891 A CN 202110077891A CN 114860820 A CN114860820 A CN 114860820A
- Authority
- CN
- China
- Prior art keywords
- operator
- etl
- service
- target
- etl service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 90
- 238000005457 optimization Methods 0.000 title description 11
- 238000012545 processing Methods 0.000 claims description 60
- 230000015654 memory Effects 0.000 claims description 37
- 230000008602 contraction Effects 0.000 claims description 14
- 238000005516 engineering process Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 abstract description 11
- 238000010586 diagram Methods 0.000 description 26
- 230000008569 process Effects 0.000 description 21
- 238000004891 communication Methods 0.000 description 19
- 230000006870 function Effects 0.000 description 18
- 238000007726 management method Methods 0.000 description 16
- 238000013507 mapping Methods 0.000 description 13
- 238000010295 mobile communication Methods 0.000 description 12
- 230000005236 sound signal Effects 0.000 description 8
- 230000008859 change Effects 0.000 description 7
- 238000013500 data storage Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 230000001133 acceleration Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 238000007667 floating Methods 0.000 description 4
- 229920001621 AMOLED Polymers 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 210000000988 bone and bone Anatomy 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000007728 cost analysis Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application provides a method and a device for optimizing technical business of a data warehouse and electronic equipment, wherein the method comprises the following steps: receiving a merging request issued by a user, wherein the merging request is used for indicating merging of at least two ETL services, and the at least two ETL services comprise a first ETL service and a second ETL service; determining whether the data sources of the first ETL service and the second ETL service are the same, if so, combining the first ETL service and the second ETL service based on a preconfigured operator combination rule to obtain a target ETL service, wherein the target ETL service is a service for reading data once and writing data multiple times, and the multiple write data in the target ETL service comprises the write data of the first ETL service and the write data of the second ETL service. The method combines operators of the ETL services of the same data sources based on the preconfigured operator combination rule, and solves the redundancy problem of the calculation logic between the ETL services.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for optimizing a data warehouse technology service, and an electronic device.
Background
An Extract-Transform-Load (ETL) service may be used to describe a process of data extraction (Extract), transformation (Transform), and loading (Load) from a source end to a destination end; the method aims to integrate scattered, disordered and standard non-uniform data in the enterprise and provide analysis basis for the decision of the enterprise. The source end can be a service system database, a distributed file system, other data bins and the like, and the destination end can be a target system database, a target file system, other target data bins and the like. The results input to the destination in the ETL service may include service metrics, statistical data, consistency data, and the like.
Currently, with the rapid development of communication technology, user traffic, 5G services, and the like have gained explosive growth in number. However, in the big data processing of an operator, the customized service logic of the ETL service system is often not optimal, and a large amount of redundancy often exists in the computation logic between the ETL services, which leads to a sudden increase in computation resources, Input and Output (IO) resources, storage resources, and the like occupied by the ETL services, and further leads to a waste of a large amount of resources.
Therefore, how to solve the redundancy problem of the computation logic between ETL services is a technical problem that needs to be solved urgently at present.
Disclosure of Invention
The embodiment of the application provides a method and a device for optimizing technical services of a data warehouse and electronic equipment, wherein operators of ETL services of the same data sources are combined based on a preconfigured operator combination rule, so that the redundancy problem of computational logic among the ETL services is solved.
In a first aspect, an embodiment of the present application provides a method for optimizing a data warehouse technology service, where the method includes: receiving a merging request issued by a user, wherein the merging request is used for indicating merging of at least two ETL services, and the at least two ETL services comprise a first ETL service and a second ETL service; determining whether the data sources of the first ETL service and the second ETL service are the same, if so, merging the first ETL service and the second ETL service based on a preconfigured operator merging rule to obtain a target ETL service, wherein the target ETL service is a service for reading data once and writing data many times, and the data written many times in the target ETL service comprises the data written in the first ETL service and the data written in the second ETL service.
Therefore, when the data sources of the ETL services are the same, the ETL services are combined into one ETL service which reads data for multiple times and writes data once, the calculation logic redundancy, the reading and writing redundancy and the like of the ETL services before combination are reduced, and the redundancy problem of the calculation logic among the ETL services is solved.
In a possible implementation manner, merging the first ETL service and the second ETL service based on a preconfigured operator merging rule includes: determining an operator depth of each operator in the first ETL service, wherein the operator depth is used for representing an interval between the first operator and a first reading operator in the first ETL service, the first reading operator is used for reading data, each operator comprises the first operator, and the first operator comprises the first reading operator; and combining a first operator in the first ETL service and a second operator in the second ETL service according to the operator depth order and the operator combination rule from small to large to obtain the target operator, wherein the operator depths of the first operator and the second operator are the same.
In one possible implementation manner, the method further includes: and when the target operator comprises a common operator, combining a third operator in the first ETL business and a fourth operator in the second ETL business based on an operator combining rule, wherein the third operator is adjacent to the first operator, and the fourth operator is adjacent to the second operator, and the common operator is an operator corresponding to the first ETL business and the second ETL business in the target ETL business.
In one possible implementation manner, the method further includes: when the target operator comprises a common operator, a first branch operator and a second branch operator, exchanging the order of the first target operator and the second target operator based on a preconfigured operator exchange rule, wherein the common operator is an operator corresponding to a first ETL service and a second ETL service in a target ETL service, the first branch operator is an operator corresponding to the first ETL service in the target ETL service, the second branch operator is an operator corresponding to the second ETL service in the target ETL service, the first target operator comprises at least one of the first branch operator and the second branch operator, the second target operator is a third operator in the first ETL service or a fourth operator in the second ETL service, the third operator is adjacent to the first operator, the fourth operator is adjacent to the second operator, and the first target operator and the second target operator correspond to the same ETL service; and after the orders of the first target operator and the second target operator are exchanged, combining the third operator and the fourth operator based on an operator combination rule.
In one possible implementation manner, the method further includes: when the first operator and the second operator cannot be combined, or the first target operator and the second target operator cannot exchange orders, or the target operation times are greater than a preset time threshold, the target operation times comprise at least one of the combination times and the exchange order times of the operators of the same type, and the combination of the first ETL service and the second ETL service is finished; combining the obtained target operator with an operator which is not combined in the first ETL service according to the depth of the operator to obtain a first branch ETL service, and combining the obtained target operator with an operator which is not combined in the second ETL service to obtain a second branch ETL service, wherein the target ETL service comprises the first branch ETL service and the second branch ETL service.
In one possible implementation manner, the method further includes: when a target operator is obtained, determining the execution cost of a third ETL service, wherein the third ETL service is composed of the obtained target operator and an operator which is not combined in the first ETL service or the second ETL service; and determining the target ETL service according to the execution cost of the third ETL service.
In a possible implementation manner, determining the target ETL service according to the execution cost of the third ETL service includes: when the third ETL service is multiple, if the execution cost of the fourth ETL service is less than or equal to the sum of the execution costs of the first ETL service and the second ETL service, the fourth ETL service is used as the target ETL service, and the fourth ETL service is the third ETL service with the minimum execution cost.
In a possible implementation manner, taking the fourth ETL service as the target ETL service includes: and setting a cache operator after a common operator contained in a third target operator in the fourth ETL service to obtain the target ETL service, wherein the third target operator is the target operator with the maximum operator depth in the fourth ETL service, and the cache operator is used for caching data output by the common operator and providing data for a branch operator behind the common operator.
In a possible implementation manner, after obtaining the target ETL service, the method further includes: and based on the upper sinking and floating rule, correcting the target ETL service, wherein the upper sinking and floating rule is as follows: the contraction operator floats upwards towards the head of the ETL service, and the expansion operator sinks towards the tail of the ETL service; the contraction operator is an operator with reduced data volume after the operator processing, and the expansion operator is an operator with increased data volume after the operator processing.
In a possible implementation manner, the target ETL service includes a common operator, a cache operator, and at least two strings of branch operators, where the cache operator is located between the common operator and the at least two strings of branch operators, and the cache operator is used to cache data output by the common operator and provide data to the first string of branch operators; the public operator comprises a reading operator, the at least two strings of branch operators comprise a first string of branch operators, the first string of branch operators comprise a writing operator of the first ETL business, the reading operator is used for reading data, and the writing operator is used for writing data.
In a second aspect, an embodiment of the present application provides an apparatus for optimizing a data warehouse technology service, where the apparatus includes: a receiving module, configured to receive a merging request issued by a user, where the merging request is used to indicate that at least two ETL services are merged, and the at least two ETL services include a first ETL service and a second ETL service; and the processing module is used for determining whether the data sources of the first ETL service and the second ETL service are the same, and if so, combining the first ETL service and the second ETL service based on a preconfigured operator combination rule to obtain a target ETL service, wherein the target ETL service is a service for reading data once and writing data multiple times, and the multiple writing data in the target ETL service comprises the writing data of the first ETL service and the writing data of the second ETL service.
In one possible implementation, the processing module is further configured to: determining an operator depth of each operator in the first ETL service, wherein the operator depth is used for representing an interval between the first operator and a first reading operator in the first ETL service, the first reading operator is used for reading data, each operator comprises the first operator, and the first operator comprises the first reading operator; and combining a first operator in the first ETL service and a second operator in the second ETL service according to the operator depth order and the operator combination rule from small to large to obtain the target operator, wherein the operator depths of the first operator and the second operator are the same.
In one possible implementation, the processing module is further configured to: and when the target operator comprises a common operator, combining a third operator in the first ETL business and a fourth operator in the second ETL business based on an operator combining rule, wherein the third operator is adjacent to the first operator, and the fourth operator is adjacent to the second operator, and the common operator is an operator corresponding to the first ETL business and the second ETL business in the target ETL business.
In one possible implementation, the processing module is further configured to: when the target operator comprises a common operator, a first branch operator and a second branch operator, exchanging the order of the first target operator and the second target operator based on a preconfigured operator exchange rule, wherein the common operator is an operator corresponding to a first ETL service and a second ETL service in a target ETL service, the first branch operator is an operator corresponding to the first ETL service in the target ETL service, the second branch operator is an operator corresponding to the second ETL service in the target ETL service, the first target operator comprises at least one of the first branch operator and the second branch operator, the second target operator is a third operator in the first ETL service or a fourth operator in the second ETL service, the third operator is adjacent to the first operator, the fourth operator is adjacent to the second operator, and the first target operator and the second target operator correspond to the same ETL service; and after the orders of the first target operator and the second target operator are exchanged, combining the third operator and the fourth operator based on an operator combination rule.
In one possible implementation, the processing module is further configured to: when the first operator and the second operator cannot be combined, or the first target operator and the second target operator cannot exchange orders, or the target operation times are greater than a preset time threshold, the target operation times comprise at least one of the combination times and the exchange order times of the operators of the same type, and the combination of the first ETL service and the second ETL service is finished; combining the obtained target operator with an operator which is not combined in the first ETL service according to the depth of the operator to obtain a first branch ETL service, and combining the obtained target operator with an operator which is not combined in the second ETL service to obtain a second branch ETL service, wherein the target ETL service comprises the first branch ETL service and the second branch ETL service.
In one possible implementation, the processing module is further configured to: when a target operator is obtained, determining the execution cost of a third ETL service, wherein the third ETL service is composed of the obtained target operator and an operator which is not combined in the first ETL service or the second ETL service; and determining the target ETL service according to the execution cost of the third ETL service.
In a possible implementation manner, the processing module is further configured to: when the third ETL service is multiple, if the execution cost of the fourth ETL service is less than or equal to the sum of the execution costs of the first ETL service and the second ETL service, the fourth ETL service is used as the target ETL service, and the fourth ETL service is the third ETL service with the minimum execution cost.
In a possible implementation manner, the processing module is further configured to: and setting a cache operator after a common operator contained in a third target operator in the fourth ETL service to obtain the target ETL service, wherein the third target operator is the target operator with the maximum operator depth in the fourth ETL service, and the cache operator is used for caching data output by the common operator and providing data for a branch operator behind the common operator.
In one possible implementation, the processing module is further configured to: after the target ETL service is obtained, the target ETL service is corrected based on a sink-up and float rule, wherein the sink-up and float rule is as follows: the contraction operator floats upwards towards the head of the ETL service, and the expansion operator sinks towards the tail of the ETL service; the contraction operator is an operator with reduced data volume after the operator processing, and the expansion operator is an operator with increased data volume after the operator processing.
In a possible implementation manner, the target ETL service includes a common operator, a cache operator, and at least two strings of branch operators, where the cache operator is located between the common operator and the at least two strings of branch operators, and the cache operator is used to cache data output by the common operator and provide data to the first string of branch operators; the common operator comprises a read operator, the at least two strings of branch operators comprise a first string of branch operators, the first string of branch operators comprises a write operator of the first ETL service, the read operator is used for reading data, and the write operator is used for writing data.
In a third aspect, an embodiment of the present application provides an electronic device, including: at least one memory and at least one processor; wherein the at least one memory is adapted to store programs and the at least one processor is adapted to execute the programs stored in the memory, and when the programs stored in the memory are executed, the at least one processor is adapted to perform the method provided in the first aspect.
In a fourth aspect, embodiments of the present application provide a computer storage medium having instructions stored therein, which when executed on a computer, cause the computer to perform the method provided in the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip, including at least one processor and an interface; wherein the interface is configured to provide program instructions or data to at least one processor, and the at least one processor is configured to execute the program instructions to implement the method provided in the first aspect.
In a sixth aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method provided in the first aspect.
Drawings
The drawings that accompany the detailed description can be briefly described as follows.
Fig. 1 is a schematic diagram of a system architecture of an ETL service optimization method provided in an embodiment of the present application;
fig. 2 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure;
FIG. 3a is a schematic diagram illustrating a change before and after operator merging according to an embodiment of the present disclosure;
FIG. 3b is a schematic diagram illustrating a variation of another operator before and after merging according to an embodiment of the present disclosure;
FIG. 3c is a schematic diagram illustrating changes before and after merging of another operator according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating a change before and after an exchange sequence of two serial operators according to an embodiment of the present application;
fig. 5 is a schematic view of a display interface of an electronic device according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a DAG of an operator in an ETL service according to an embodiment of the present application;
fig. 7a is a schematic model diagram of an ETL service to be merged according to an embodiment of the present application;
FIG. 7b is a schematic diagram of a model of a merged ETL service provided by an embodiment of the present application;
fig. 7c is a schematic diagram of a model of a merged ETL service provided in an embodiment of the present application;
FIG. 7d is a schematic diagram of a model of a merged ETL service provided by an embodiment of the present application;
FIG. 7e is a schematic diagram of a model of a merged ETL service provided in an embodiment of the present application;
fig. 8 is a schematic model diagram of a merged ETL service according to an embodiment of the present application;
FIG. 9 is a model diagram of an ETL service for reading data once and writing data many times according to an embodiment of the present application;
fig. 10 is a flowchart schematically illustrating an ETL service optimization method according to an embodiment of the present application;
fig. 11 is a schematic diagram illustrating a step of merging a first ETL service and a second ETL service based on a preconfigured operator merging rule according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an ETL service optimization apparatus according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 14 is a schematic structural diagram of a chip according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be described below with reference to the accompanying drawings.
In the description of the embodiments of the present application, the words "exemplary," "for example," or "for instance" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary," "e.g.," or "e.g.," is not to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the words "exemplary," "e.g.," or "exemplary" is intended to present relevant concepts in a concrete fashion.
In the description of the embodiments of the present application, the term "and/or" is only one kind of association relationship describing an associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, B exists alone, and A and B exist at the same time. In addition, the term "plurality" means two or more unless otherwise specified. For example, the plurality of systems refers to two or more systems, and the plurality of terminals refers to two or more terminals.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicit indication of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
The following introduces related terms or related concepts involved in the present solution.
(1) Operator
The operator is the smallest unit of computation in the ETL business.
(2) Read operator
The reading operator is an operator for reading in a data source in the ETL business, and is the first operator of the ETL business.
(3) Write operator
The write operator is an operator for loading data in the ETL service, namely, the operator for writing the data into the destination end is the last operator of the ETL service.
(4) Expansion operator
The expansion operator is an operator for expanding the data volume after the data is calculated by the operator.
(5) Shrinkage operator
The contraction operator is an operator for reducing the data volume after the data is calculated by the operator.
(6) Operator characteristic index set
The operator characteristic index set is an index set which can uniquely determine a certain operator.
(7) Operator dimension
Operator dimensions are properties that describe fact records in fact data, some properties provide descriptive information, and some properties specify how to aggregate fact data table data so that an analyst provides useful information.
(8) Calculation formula of operator
The calculation formula of the operator is a calculation model solidified in the ETL service, and is a formula for performing mathematical calculation on the dimension of the operator.
(9) Parent operator and child operator
In the ETL service, the output of an operator is the input of another operator, and then the operator is called as the parent of the other operator, and the other operator is called as the child of the operator.
(10) Operator depth
The operator depth is that in the ETL service, the operator depth can be defined to be 0, and the operator depth of a child operator is the operator depth of a parent operator plus 1, so that the operator depth of each operator is obtained. For example, in the ETL service, if the operator depth of the nth operator is N, the operator depth of the (N + 1) th operator is N +1, where the nth operator is a parent operator of the (N + 1) th operator.
(11) ETL traffic of the same data source
For a plurality of ETL services, if the data sources read by the reading operators of the ETL services are the same, the ETL services are called ETL services of the same data source.
Next, an application scenario of the optimization method of the ETL service provided in the scheme is described.
Fig. 1 is a schematic diagram of a system architecture of an ETL service optimization method according to an embodiment of the present application. As shown in fig. 1, the data collection module 11 may collect data, for example, the collected data may be a user name, a protocol type, a user type, a cell, etc. in a telecommunication service; the data storage module 12 may store the data acquired by the data acquisition module 11; the ETL service system 13 may extract data from the data storage module 12, then convert, load, and the like the extracted data, and finally store the processing result in a distributed file system or a database, and the like, where the ETL service system may include a plurality of ETL services; the data access service module 14 may present the processing result of the ETL business system to the user, perform human-computer interaction with the user, and the like. In an example, a user may issue a request for merging a plurality of ETL services through a human-computer interaction interface provided by the data access service module 14, where the request may instruct the ETL service system 13 to merge a plurality of ETL services, where the request may include an identifier of the ETL service selected by the user to be merged; then, the ETL service system 13 may merge multiple ETL services selected by the user based on an operator merging rule configured in advance by the user, so that the multiple ETL services can be combined into an ETL service capable of reading data once and writing data many times. It can be understood that each write data in the combined ETL service can correspond to one write data of the ETL service before the combination; it can also be understood that the merged ETL service includes a plurality of write operators, and each write operator corresponds to a write operator of the ETL service before merging.
It is understood that the ETL business system 13 shown in fig. 1 may be configured in an electronic device alone, or may be configured in the same electronic device together with one or more of the data acquisition module 11, the data storage module 12, and the data access service module 14, which is not limited herein.
A hardware structure diagram of an electronic device provided in an embodiment of the present application is described below.
Fig. 2 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application. As shown in fig. 2, the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, for example, the processor 110 may include one or more of an Application Processor (AP), a modem (modem), a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural Network Processor (NPU), among others. The different processing units may be separate devices or may be integrated into one or more processors.
A memory may also be provided in processor 110 for storing instructions and data. In some examples, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory to avoid repeated accesses, reduce the waiting time of the processor 110, and improve the efficiency of the system. In some examples, the processor 110 may be configured to merge the ETL services based on a merging request of the ETL services issued by the user, and the like.
In some examples, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a Universal Asynchronous Receiver Transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a Universal transmit/output (GPIO), a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, and the like.
The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging examples, the charging management module 140 may receive charging input of a wired charger through the USB interface 130. In some examples of wireless charging, charging management module 140 may receive a wireless charging input through a wireless charging coil of electronic device 100. The charging management module 140 may also supply power to other electronic devices through the power management module 141 while charging the battery 142.
The power management module 141 is used for connecting the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In other examples, the power management module 141 may also be disposed in the processor 110. In other examples, the power management module 141 and the charging management module 140 may be disposed in the same device.
The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other examples, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive electromagnetic waves from at least two antennas including the antenna 1, filter, amplify, and transmit the received electromagnetic waves to a modem for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some examples, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some examples, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.
The modem may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some examples, the modem may be a stand-alone device. In other examples, the modem may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110. In other examples, mobile communication module 150 may be a module in a modem.
The wireless communication module 160 may provide a solution for wireless communication applied to the electronic device 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.
In some examples, antenna 1 of electronic device 100 is coupled with mobile communication module 150 and antenna 2 is coupled with wireless communication module 160 so that electronic device 100 can communicate with networks and other devices through wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), Long Term Evolution (LTE), fifth generation, new air interface (new radio, NR), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).
The electronic device 100 implements display functions via the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some examples, the electronic device 100 may include one or more display screens 194.
The electronic device 100 may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display 194, the application processor, and the like.
The ISP is used to process the data fed back by the camera 193. For example, when shooting, a shutter is opened, light is transmitted to a camera photosensitive element through a lens, an optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to an ISP (internet service provider) for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some examples, the ISP may be provided in camera 193.
The camera 193 is used to capture still images or video, for example, facial feature information, pose feature information, and the like of the user. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some examples, electronic device 100 may include one or more cameras 193.
The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.
Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.
The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100, and the like. In addition, the internal memory 121 may include a high speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a Universal Flash Storage (UFS), and the like.
The electronic device 100 may implement audio functions via the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some examples, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.
The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The electronic apparatus 100 can listen to music through the speaker 170A or listen to a handsfree call.
The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the electronic apparatus 100 receives a call or voice information, it is possible to receive voice by placing the receiver 170B close to the human ear.
The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 170C by speaking the user's mouth near the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other examples, the electronic device 100 may be provided with two microphones 170C to implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, perform directional recording, and so on.
The headphone interface 170D is used to connect a wired headphone. The headset interface 170D may be the USB interface 130, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
The pressure sensor 180A is used for sensing a pressure signal, and converting the pressure signal into an electrical signal. In some examples, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic apparatus 100 may also calculate the touched position from the detection signal of the pressure sensor 180A. In some examples, touch operations that act on the same touch location but at different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.
The gyro sensor 180B may be used to determine the motion attitude of the electronic device 100. In some examples, the angular velocity of electronic device 100 about three axes (i.e., x, y, and z axes) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. Illustratively, when the electronic device 100 is used to collect user characteristic information in an environment, the gyro sensor 180B detects a shake angle of the electronic device 100, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract shake of the electronic device 100 through reverse movement, thereby achieving anti-shake.
The air pressure sensor 180C is used to measure air pressure. In some examples, electronic device 100 calculates altitude, aiding in positioning and navigation, from barometric pressure values measured by barometric pressure sensor 180C.
The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. The method can also be used for recognizing the posture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.
A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some examples, when the electronic device is utilized to collect user characteristic information of a user in an environment, the electronic device 100 may utilize the distance sensor 180F to range for fast focus.
The ambient light sensor 180L is used to sense the ambient light level. Electronic device 100 may adaptively adjust the brightness of display screen 194 based on the perceived ambient light level.
The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 can utilize the collected fingerprint characteristics to unlock the fingerprint, access the application lock, photograph the fingerprint, answer an incoming call with the fingerprint, and so on.
The temperature sensor 180J is used to detect temperature. In some examples, electronic device 100 implements a temperature processing strategy using the temperature detected by temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs a reduction in performance of a processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, the electronic device 100 heats the battery 142 when the temperature is below another threshold to avoid the low temperature causing the electronic device 100 to shut down abnormally. In other embodiments, when the temperature is lower than a further threshold, the electronic device 100 performs a boost on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.
The touch sensor 180K is also called a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on a surface of the electronic device 100, different from the position of the display screen 194.
The keys 190 include a power-on key, a volume key, an input keyboard, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The electronic apparatus 100 may receive a key input, and generate a key signal input related to user setting and function control of the electronic apparatus 100.
The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., video playback, audio playback, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.
Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.
The following describes in detail a technical solution provided by an embodiment of the present application based on a system architecture shown in fig. 1 and a hardware structure of an electronic device shown in fig. 2, with reference to the accompanying drawings.
(1) Configuring operator merge rules and operator exchange rules
In this scheme, the operator merging rule may be understood as a rule for merging operators in multiple ETL services, and the operator exchange rule may be understood as a rule for exchanging orders of operators corresponding to one ETL service in multiple ETL services. For example, the plurality of ETL services include service a and service B, service a includes operators a1 and B1, service B includes operators a2 and B2, the operator merging rule may be a rule for merging a1 and a2, and the operator exchanging rule may be a rule for exchanging the order of a1 and B1; in addition, if two operators c1 and c2 are generated after a1 and a2 are combined, where c1 is the operator corresponding to business a and c2 is the operator corresponding to business B, the operator exchange rule may be a rule for exchanging the order of c1 and B1.
In an example, both the operator merging rule and the operator exchanging rule may be configured in advance, where the operator merging rule and the operator exchanging rule may be configured manually or automatically by an electronic device or the like, which is not limited herein. Exemplarily, in the configuration process, a feature index set of each operator can be configured in advance for each operator in an operator pool in the ETL service system, where the feature index set may include an operator name, an operator dimension, an operator calculation formula, and the like, then, semantic analysis is performed on the feature index set of each operator, and finally, an operator merging rule and an operator exchange rule are configured based on a semantic analysis result.
The operator merging rule and the operator exchanging rule are introduced below.
a. Operator merge rules
In the scheme, the operator merging rule can be a merging rule constructed aiming at a plurality of operators with the same type, and the aim is to generate a new equivalent operator by modifying the calculation logic of two operators to be merged, so that the problem of calculation logic redundancy between ETL services is reduced. When the operators are combined, a feature index set corresponding to the common computation logic can be extracted from the operators to be combined to form a common operator; on the basis of the public operator, comparing the characteristic index set of the operator before merging, and modifying the characteristic index set of the operator before merging on the premise of keeping the output of the operator before merging unchanged to form a branch operator.
It can be understood that, since the readers in different ETL services are all used to read data, the readers in different ETL services can be directly merged.
It will be appreciated that the common operator and the plurality of branch operators may be concatenated to form a fork operator having inputs and outputs equivalent to the respective inputs and outputs of the operators prior to merging.
In this scheme, a plurality of operators with single input may be combined, a plurality of operators with multiple inputs may be combined, or a plurality of operators with multiple outputs may be combined. In order to facilitate understanding of the change of the operators before and after the merging, a description will be given below of an operator that merges a plurality of single inputs, an operator that merges a plurality of multiple inputs, and an operator that merges a plurality of multiple outputs.
As shown in fig. 3a, which shows a variation of merging two single-input operators, a common operator and two branch operators can be generated after merging, where branch operator 1 corresponds to operator 1 before merging, and branch operator 2 corresponds to operator 2 before merging. As shown in fig. 3b, which shows a variation of merging two multi-input operators, after merging, a common operator and two branch operators can be generated, where branch operator 1 corresponds to operator 1 before merging, and branch operator 2 corresponds to operator 2 before merging. As shown in fig. 3c, which shows a variation of the operator for merging two multiple outputs, a common operator and two branch operators can be generated after merging, where branch operator 1 corresponds to operator 1 before merging, and branch operator 2 corresponds to operator 2 before merging. Wherein each operator in fig. 3a, 3b and 3c has a corresponding set of feature indicators. The fork operator mentioned above is understood to be the operator in fig. 3a that is composed of the common operator and the branch operator 1 and the branch operator 2.
For easy understanding, a rule table of operator merging rules in the present solution is introduced below. As shown in Table I, when the type of the two operators to be merged is aggregate, the feature index set of the operator before merging is group (dim) 3 +dim 4 )sum(f(dim 1 )),groupby(dim 3 +dim 5 ),sum(g(dim 2 ) The merging condition of the two operators is null, namely the two operators can be directly merged; a common operator and two branch operators formed after combination, wherein the characteristic index set of the common operator is group (dim) 3 +dim 4 +dim 5 ),sum(f(dim 1 )),sum(g(dim 2 ) ) the feature index set of branch operators is group (dim) 3 +dim 4 )sum(f(dim 1 )),groupby(dim 3 +dim 5 ),sum(g(dim 2 ))。
Watch 1
The symbols in table one are specifically defined as follows:
columns: the operator is directed to an operation field in the input table.
Sum: summing the corresponding fields for the convergence operator is an operation on the convergence dimension.
Selecting: the corresponding field is selected for the input form.
Condition: a filter expression for the corresponding field.
Jointype refers to the type of join operator, commonly known as leftjoin, innerjoin, crossjoin.
dim 1 ,dim 2 ,dim 3 ,dim 4 Representing a dimension in telecommunications services, for example: user name, protocol type, user type, cell, etc.
b. Operator exchange rules
In this scheme, the operator exchange rule may be an exchange rule constructed for two serial operators. The two serial operators are a front operator and a rear operator of a serial structure, and the output of the front operator is the input of the rear operator in the scheme. In the scheme, the input and the output of the two operators can be kept unchanged by exchanging the sequence (or the order) of the pre-operator and the post-operator and modifying the characteristic index sets of the pre-operator and the post-operator. For example, as shown in fig. 4, after the front operator and the back operator satisfy the operator exchange condition, the order of the front operator and the back operator may be exchanged, and the feature index sets of the front operator and the back operator may be modified at the same time, so that the input and the output of the two operators are unchanged after the front operator and the back operator exchange the order.
For convenience of understanding, a rule table of the operator exchange rule in this scheme is described below, where the rule table is composed of a table two and a table three, and a row in the table two where an operator before exchange is located corresponds to a row in the table three where an operator after exchange is located. As shown in table two, before swapping: if the characteristic index set of the previous operator is mapping (f (dim) 1 )→dim 2 ) The characteristic index set of the post-operator is mapping (g (dim) 2 )→dim 3 ) (ii) a The third table shows that the exchange conditions of the two operators are null, so that the two operators can be directly exchanged; after the swap, the feature index set of the following operator is mapping (fusion (f, g)) and the feature index set of the preceding operator is empty, i.e. two operators can be replaced by one operator after the swap. As shown in table two, before swapping: if the characteristic index set of the previous operator is mapping (f (dim) 11 )→dim 21 ,g(dim 12 )→dim 22 ) The feature index set of the post-operator is aggregate (sum (dim) 21 )),group(dim 22 ) ); as can be seen from table three, the exchange condition of the two operators is "the g function is one-to-one", and when the previous operator and the subsequent operator satisfy the exchange condition "the g function is one-to-one", as shown in table three, the index sets after the exchange of the two operators are respectively: the feature index set of the post-operator is aggregate (sum (f (dim)) 11 )))→dim 31 ,group(dim 12 )→dim 32 ) The characteristic index set of the preceding operator is mapping (dim) 31 ,g(dim 32 ))。
Watch two
The specific symbols in the table are explained as follows:
dim L 、dim L1 the first input (which may be referred to as the left input), dim, representing a two-input operator R 、dim R1 Representing the second input of the two-input operator (which may be referred to as the right input).
Other definitions are given in Table I.
Watch III
The specific symbols in the table are explained as follows:
fusion (f, g) is a complex function of functions f and g, equivalent to f (g ()).
L and R represent the left input table and the right input table of the two input operators respectively.
Other definitions are given in Table I and Table II.
It can be understood that, in the present solution, after the operator merging rule and the operator exchange rule are configured, the merging of multiple ETL services can be performed.
(2) User sends down merging request for ETL service
In this scheme, the ETL service system may be configured in the electronic device, and at this time, a user may select, through a human-computer interaction interface on the electronic device, an ETL service that needs to be merged in the ETL service system, where the user may, but is not limited to, select through touch control or voice control. Exemplarily, as shown in fig. 5, on a display interface of a display screen of the electronic device a, an operation interface of the ETL service system may be displayed, and on the operation interface, various ETL services, such as service a, service b, and service c, may be displayed; at this time, the user may select to merge services a and b in the ETL service; after the user selects the services a and b in the ETL service, the user clicks the "confirm" button on the display screen to complete the selection operation, and at this time, the user issues a merge request for the services a and b in the ETL service. In one example, the merge request issued by the user may be used to instruct to merge at least two ETL services, for example, to merge services a and b in fig. 5. In addition, the merge request may carry an identifier of the ETL service that needs to be merged, for example, when the user selects to merge the services a and b in fig. 5, the merge request may carry the identifiers of the services a and b.
It is understood that, in the present embodiment, the ETL service system may also be configured in other devices, and is not limited herein.
(3) Determining whether user-selected ETL services can be merged
After the user issues the merge request, the at least two ETL services selected by the user to be merged may be analyzed to determine whether the ETL services can be merged. In one example, it may be determined whether the data sources of the ETL service selected by the user are the same; if the ETL services are the same, the input of each ETL service is the same, and at the moment, the ETL services can be combined, namely, the next flow can be carried out; if not, it indicates that the inputs of the ETL services are different, and at this time, the ETL services cannot be combined, i.e., the process can be ended. For example, in the present solution, the data source may be a table in a database, for example: a table of network interruptions in the telecommunications service; therefore, whether the data sources of the ETL services are the same or not can be determined according to the names or table names of the data sources.
For example, the user selects the ETL services to be merged as services a, b, and c, and when the data sources of the services a, b, and c are the same, the services a, b, and c may be merged; when the data sources of the services a, b and c are different, the services a, b and c may not be merged, or only two services with the same data source, such as the services a and b, may be merged.
(4) Marking operator depth of each operator in each ETL service
After it is determined that at least two ETL services selected by the user can be merged, the operator depth of each operator in each ETL service can be marked. In one example, a Directed Acyclic Graph (DAG) of operators in each ETL business can be constructed, such as based on the flow of data; wherein, in the process of constructing the DAG of the operator, the operator depth of the operator can be marked. For example, as shown in fig. 6, in the DAG of the constructed operator shown in fig. 6, operator 1 is the first operator (i.e., a read operator) of the ETL service, and operator 5 is the last operator (i.e., a write operator) of the ETL service, at this time, if the operator depth of the marker operator 1 is 0, the operator depth of operator 2 is 1, the operator depth of operator 3 is 2, the operator depth of operator 4 is 3, and the operator depth of operator 5 is 4.
(5) Merging multiple ETL services
In the scheme, when a plurality of ETL services are combined, operators with the same operator depth in the ETL services are combined. Therefore, after the operator depth of each operator in each ETL service is marked, the merging work of a plurality of ETL services can be carried out.
When merging a plurality of ETL services, whether operators with the same operator depth in the ETL services to be merged can be determined based on the preconfigured operator merging rules. As can be seen from the operator merging rule described above, the operators of the ETL services to be merged can be directly merged, so that the operators of the ETL services to be merged can be directly merged, and then whether the operators with the same operator depth in the ETL services to be merged meet the merging condition in the operator merging rule or not is sequentially judged from small to large according to the operator depth, and when the operators meet the merging condition, the operators can be merged; when not, then no merging is possible. In the scheme, when the operators with the same operator depth in the ETL service to be merged do not meet the merging condition in the operator merging rule, the process can be ended, namely, the merging work is ended.
In one example, when merging operators, if only one common operator is obtained, it can be directly determined whether the operators at the next operator depth can be merged, and when merging is possible, merging is performed. For example, with continued reference to table one above, when the operator type of two operators to be merged is mapping, after merging the two operators, only one common operator is obtained, and no branch operator is obtained, so that the merging of operators of the next operator depth (e.g., operators of type aggregate) can be started at this time.
In an example, when an operator is merged, if a common operator and a plurality of branch operators are obtained, it may be determined, based on the above preconfigured operator exchange rule, whether the operator at the next operator depth in the ETL service corresponding to the obtained branch operator and the branch operator can exchange an order, and when the operator can be exchanged, the order of the two operators is exchanged. For example, with continued reference to table one above, when the operator type of two operators that need to be merged is aggregate, after the two operators are merged, a common operator and two branch operators can be obtained; if the type of the operator at the next operator depth of one of the branch operators is filter, continuing to refer to the table two, the former operator is the branch operator, the latter operator is the operator at the next operator depth of the branch operator, continuing to refer to the table three, and the exchange conditions of the two operators are null, so that the order can be exchanged, that is, the order of the two operators is exchanged.
Further, after the order of the pre-operators and the post-operators is swapped, the post-operators can be merged based on the operator merging rules. The former operator may be understood as a branch operator generated after the merge operator, and the latter operator may be understood as an operator at the depth of the next operator in the ETL service corresponding to the branch operator.
It can be understood that, in the present solution, if the order of the two operators satisfies the exchange condition in the operator exchange rule, the process may be ended, that is, the merge operation is ended. At this time, the obtained common operator and branch operator, and the operator that is not merged in the ETL service corresponding to the branch operator, may form the merged ETL service.
To facilitate understanding of the merging process, the following example is given.
As shown in fig. 7a, fig. 7a is a model of two ETL services to be merged, i.e. services a and b, and the data sources of the two ETL services are the same, so that the services can be merged. If operators a1 and b1 are merged to obtain common operator 1 and no branch operator is obtained, then after merging operators a1 and b1, the model of ETL service shown in fig. 7b can be obtained. In fig. 7b, operators a2 and b2 may be merged, and after merging operators a2 and b2, a model of ETL service as shown in fig. 7c may be obtained, that is, after merging operators a2 and b2, common operator 2, branch operator 21, and branch operator 22 may be obtained; at this time, it can be determined whether branch operator 21 and operator a3 can exchange orders, and whether branch operator 22 and operator b3 can exchange orders, if so, the orders are exchanged, and then the model of ETL service shown in fig. 7d is obtained. In fig. 7d, operators a3 and b3 may be merged, and after merging operators a3 and b3, a model of ETL service as shown in fig. 7e may be obtained, that is, after merging operators a3 and b3, common operator 3, branch operator 31, and branch operator 32 may be obtained; at this time, it may be determined whether the branch operator 31 and the branch operator 21 can exchange orders, and whether the branch operator 32 and the branch operator 22 can exchange orders, if not, the process is ended, that is, the merging operation is ended, and at this time, the model of the ETL service shown in fig. 7e may constitute a model of the merged ETL service.
It should be noted that, in the present solution, in order to avoid the continuous merging and/or continuous exchange times of the operators of the same type, when the merging times of the operators of the same type reach a preset time threshold and/or the exchange times of the operators of the same type reach a preset time threshold, the flow is ended, that is, the merging operation is ended.
For ease of understanding, the merging process is described below in a more diagrammatic example.
a. When a plurality of ETL services are merged, the sub-operator set of the read operator of each ETL service to be merged can be recorded as UnderMathed, and the UnderMathed can be divided into two parts by combining the sub-operator set of the read operator: UnderMatched ═ M 1 ∪M 2 ∪…M k . The sub-operators of the two combinable reading operators form a set M; sub-operators of the read operator without mergeable operators, the individual members forming a set M, maximum mergeablePairwise partitioning may be understood as the partitioning that minimizes the number of sets M. Illustratively, the ETL service to be merged includes services a, b, c, d, e, and the sub-set of the read operators of service a is F a The subset of sub-operators of the read operator of service b is F b The subset of sub-operators of the read operator of service c is F c The subset of sub-operators of the read operator of service d is F d The subset of sub-operators of the read operator of service e is F e Then, the maximum combinable pairwise division is: f a And F b Can be a combination M, F c And F d Can be a combination M, F e Alone, it may be a combination M.
Set M i Can contain an operator O i1 ,For the containment operator O i1 Andthe step of merging the two ETL services can be carried out; for M i Containing only one operator O i Then the corresponding containing operator O i The ETL services of (2) can only be combined with the read operator.
b. After the set of the sub-operators of the read operator of each ETL service to be merged is divided into two by two in the largest way, one set M can be divided i Operator O contained in i1 ,And merging.
First, merge reader (R) i1 ,R i2 ) And generating a new DAG:
the depth of the Reader is 0, and can be expressed as i, and the depths of the sub-operators increase in sequence, so that the depth of the mapping operator is expressed as i 1.
c. Based on the merging condition of mapping operator in preconfigured operator branch rule, it can be known that mapping is qualified and unconditional, so that operator mapping i1 、May be combined. The merged DAG is:
d. judging mapping cob Whether or not swapping is required. Due to the mapping after merging cob No branch operator is included and therefore no swap is determined to be needed.
d. If the operator of the aggregate type can be merged according to the operator merging rule, the operator aggregate is subjected to i2 ,Performing merging, namely:
the new ETL service after merging is:
e. determining the sub-operator aggregate ib And filter i3 (post operator) whether swapping is possible. According to the second and third tables, the operator is aggregated ib →filter i3 And executing: sink (aggregate) ib ,filter i3 ) Operator of pairExecuting: sinkI.e. the swap operator aggregate ib And filter i3 Order of (2), and swap operatorAndin the order of
The new ETL service generated after the exchange of the order is:
f. steps c, d, and e are performed recursively until the output operator Writer (i.e., the write operator), which is the last operator of the ETL service, is disabled from merging, and swapping.
g. And (3) correction: and e, adjusting the order of operators for the DAG of the combined ETL service output by the step e based on the up-down floating rule. Wherein, the upper sinking and floating rule is as follows: the expansion operator sinks towards the tail of the path (i.e. the direction of the write operator) and the contraction operator floats towards the head of the path (i.e. the direction of the read operator). Thereafter, the DAG of the modified ETL service can be output.
(6) Determining the execution cost of ETL service formed after each merging operator
In the scheme, after the operators are combined each time, the execution cost of the ETL service formed after the operators are combined can be determined. The reference dimension of the execution cost may be any service bottleneck dimension, such as execution time, IO peak, memory, and the like.
Further, selecting the ETL service with the minimum execution cost from the ETL services formed after the operators are combined as the ETL service obtained by combining a plurality of ETL services. Because the ETL service obtained by combining a plurality of ETL services comprises the common operator and the branch operator, and the data output by the common operator with the maximum operator depth is the input data of the branch operator corresponding to each ETL service before the plurality of ETL services are combined, in order to facilitate the branch operator to obtain the input data, a cache operator can be arranged between the common operator with the maximum operator depth and the branch operator, and the cache operator can cache the output data of the common operator and provide the data for the branch operator. In the scheme, the ETL service with the cache operator can be used as the combined ETL service. For example, as shown in fig. 8, fig. 8 shows a model diagram of a merged ETL service, the ETL service is obtained by merging ETL service a and ETL service b, and its operators are composed of common operators 1 and 2, cache operator 3, and branch operators 41, 42, 43, 51, 52, and 53, where common operator 1 and 2 can be understood as the operator corresponding to ETL service a and b together, the string of operators composed of branch operators 41, 42, and 43 can be understood as the operator corresponding to ETL service a, and the string of operators composed of branch operators 51, 52, and 53 can be understood as the operator corresponding to ETL service b.
In one example, the execution cost of the ETL service formed after each merging operator and the setting position of the caching operator can be determined in the following manner. The method comprises the following specific steps:
firstly, for ETL service obtained by combining operators each time, recording the set of common operators as CS common ={C 1 ,C 2 ,…,C c The set of branch operators is BS branch ={B 1 ,B 2 ,…,B b And the set of newly added operators compared with the original ETL service is AS ═ A 1 ,A 2 ,…,A a The execution cost of the ETL service is
Wherein, lambda, xi, theta O Cost is performed for a pre-configured corresponding operator unit data amount. Set of branch operators BS branch Can be understood as the set of operators which are not merged, and is increased compared with the original ETL serviceThe set of operators in (1) is a set of branch operators generated after the AS is understood AS operator merging.
Secondly, the set of common operators of the ETL service formed after the operator combination for many times can be recorded as OS common ={O 1c ,O 2c ,…,O kc H, the operator depth of the corresponding common operator is 1,2, …, k, i.e. k common operators are obtained in total.
Further, when the operator depth of the common operator reaches j (j is more than or equal to 1 and less than or equal to k), the operator merging is stopped, and the generated merged ETL is recorded as ETL j Then optimal cache point O cache To minimize the ETL execution cost of the merged cache point:
finally, in the common operator O cache Then setting a cache operator, forming an ETL service which is combined with the cache operator and has the minimum execution cost, and recording as the ETL service cache 。
(7) Decision to determine whether to perform merging of multiple ETL services
In the scheme, cost analysis can be performed on the execution cost of the multiple ETL services before merging and the execution cost of the ETL services with the cache operator after merging, and then a decision whether to execute the decision of merging the multiple ETL services is made. The method comprises the following specific steps:
a. if the sum of the execution costs of the multiple ETL services before merging is greater than the execution cost of the ETL service with the cache operator after merging, it indicates that after merging the multiple ETL services, the computational logic redundancy, the read-write redundancy, and the like of the multiple ETL services before merging can be significantly reduced, so that the multiple ETL services can be merged, that is, a decision to merge the multiple ETL services is executed, and finally the multiple ETL services become one ETL service (such as the ETL service shown in fig. 8) that reads data once and writes data many times.
b. If the sum of the execution costs of the ETL services before combination is less than the execution cost of the ETL services with the cache operator after combination, it indicates that after the ETL services are combined, the computational logic redundancy, the read-write redundancy and the like of the ETL services before combination cannot be obviously reduced, and therefore the combination is not performed. However, in order to reduce the read-write redundancy, only the readers of the ETL services may be merged without merging other operators of the ETL services, that is, a decision to merge the ETL services is performed, and finally the ETL services become one ETL service for reading data once and writing data many times.
c. If the sum of the execution costs of the ETL services before merging is equal to the execution cost of the ETL services with the cache operator after merging, the method indicates that the computational logic redundancy, the read-write redundancy and the like of the ETL services before merging cannot be obviously reduced after the ETL services are merged. Although the execution cost of the ETL service before and after merging is the same, a plurality of ETL services become one ETL service, which can make the merged ETL service have great advantages in task scheduling, issuing, management, etc., so that a decision to merge a plurality of ETL services can be executed, and finally the plurality of ETL services become one ETL service for reading data once and writing data many times.
In an example, as shown in fig. 9, which shows a model schematic diagram of an ETL service for reading data once and writing data many times, as can be seen from fig. 9, the merged ETL service has a read operator, n common operators, a cache operator, m string branch operators, and m write operators, where an ETL service before merging can correspond to a string of branch operators and the write operators connected to the string of branch operators. It can be understood that, in the present solution, the read operator or the cache operator in the merged ETL service may also be referred to as a common operator, and the write operator may also be referred to as a branch operator; in other words, in the ETL service after merging, an operator corresponding to a plurality of ETL services before merging together may be referred to as a common operator, and an operator not corresponding to a plurality of ETL services before merging together may be referred to as a branch operator, where each ETL service before merging may correspond to a string of branch operators. The string of branch operators mentioned in the present embodiment may be understood as a plurality of branch operators connected in series and having a relationship with each other.
It can be understood that, after combining a plurality of ETL services to change the plurality of ETL services into one ETL service for reading data once and writing data many times, the ETL service model in the ETL service system will change, and the number of ETL services will be reduced, that is, a plurality of ETL services before combining are eliminated, and the ETL service formed after combining a plurality of ETL services is saved. And in the subsequent data processing process, carrying out data processing on the ETL service formed by combining the plurality of ETL services.
Next, a method for optimizing an ETL service provided in an embodiment of the present application is introduced based on the above-described merging process of multiple ETL services. It will be appreciated that this method is another expression of the merging process of the ETL services described above, and the two are combined. The method is proposed based on the merging process of the multiple ETL services described above, and part or all of the contents of the method can be referred to the description of the merging process of the multiple ETL services above.
Referring to fig. 10, fig. 10 is a schematic flowchart illustrating an ETL service optimization method according to an embodiment of the present disclosure. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. As shown in fig. 10, the method for optimizing ETL service includes:
step S101, receiving a merging request issued by a user, wherein the merging request is used for indicating to merge at least two ETL services, and the at least two ETL services comprise a first ETL service and a second ETL service;
step S102, determining whether the data sources of the first ETL service and the second ETL service are the same.
In this scheme, if the data sources of the ETL services to be merged are the same, step S103 is executed, otherwise, step S104 is executed.
Step S103, merging the first ETL service and the second ETL service based on a preconfigured operator merging rule to obtain a target ETL service, wherein the target ETL service is a service for reading data once and writing data many times, and the data written many times in the target ETL service comprises the data written by the first ETL service and the data written by the second ETL service.
And step S104, finishing the combination.
It is to be understood that, for some or all of the descriptions in the method provided by the present disclosure, reference may be made to the above descriptions, and thus, detailed descriptions thereof are omitted.
In one example, as shown in fig. 11, merging the first ETL service and the second ETL service based on the preconfigured operator merging rule may include the following steps:
step S201, determining an operator depth of each operator in the first ETL service, where the operator depth is used to represent an interval between the first operator and a first operator in the first ETL service, the first operator is used to read data, each operator includes the first operator, and the first operator includes the first operator.
Step S202, combining a first operator in the first ETL service and a second operator in the second ETL service according to the operator depth sequence and the operator combining rule from small to large to obtain a target operator, wherein the operator depths of the first operator and the second operator are the same.
It is to be understood that, for some or all of the descriptions in the method provided by the present disclosure, reference may be made to the above descriptions, and thus, detailed descriptions thereof are omitted.
In one example, when the target operator includes a common operator, a third operator in the first ETL service and a fourth operator in the second ETL service may be merged based on an operator merging rule, where the third operator is adjacent to the first operator and the fourth operator is adjacent to the second operator, and the common operator is an operator corresponding to the first ETL service and the second ETL service in the target ETL service.
When the target operator includes a common operator, a first branch operator and a second branch operator, the order of the first target operator and the second target operator may be exchanged based on a preconfigured operator exchange rule, where the common operator is an operator corresponding to both the first ETL service and the second ETL service in the target ETL service, the first branch operator is an operator corresponding to the first ETL service in the target ETL service, the second branch operator is an operator corresponding to the second ETL service in the target ETL service, the first target operator includes at least one of the first branch operator and the second branch operator, the second target operator is a third operator in the first ETL service or a fourth operator in the second ETL service, the third operator is adjacent to the first operator, the fourth operator is adjacent to the second operator, and the first target operator and the second target operator correspond to the same ETL service.
Further, after the order of the first target operator and the second target operator is exchanged, the third operator and the fourth operator are merged based on the operator merging rule.
It will be appreciated that after a plurality of operator exchanges, a branch operator may be connected to the newly generated branch operator, for example, as shown in fig. 7e, a branch operator 21 is connected to the branch operator 31. At this time, the third operator and the fourth operator may connect a branch operator to the newly generated branch operator.
In an example, when the first operator and the second operator cannot be merged, or the first target operator and the second target operator cannot exchange orders, or the target operation number is greater than a preset number threshold, the target operation number includes at least one of the merging number and the exchange order number of operators of the same type, and merging the first ETL service and the second ETL service is ended.
Further, according to the operator depth, combining the obtained target operator with an operator which is not merged in the first ETL service to obtain a first branch ETL service, and combining the obtained target operator with an operator which is not merged in the second ETL service to obtain a second branch ETL service, wherein the target ETL service includes the first branch ETL service and the second branch ETL service.
In one example, when the target operator is obtained, an execution cost of a third ETL service may be determined, where the third ETL service is composed of the obtained target operator and an operator that is not merged in the first ETL service or the second ETL service. And then, determining the target ETL service according to the execution cost of the third ETL service.
As a possible implementation manner, when the third ETL service is multiple, if the execution cost of the fourth ETL service is less than or equal to the sum of the execution costs of the first ETL service and the second ETL service, the fourth ETL service is taken as the target ETL service, where the fourth ETL service is the third ETL service with the minimum execution cost.
Further, regarding the fourth ETL service as the target ETL service, the method may include: and setting a cache operator after a common operator contained in a third target operator in the fourth ETL service to obtain the target ETL service, wherein the third target operator is the target operator with the maximum operator depth in the fourth ETL service, and the cache operator is used for caching data output by the common operator and providing data for a branch operator behind the common operator.
In one example, after the target ETL service is obtained, the target ETL service may be modified based on the sink-up and float rules, where the sink-up and float rules are as follows: the contraction operator floats upwards towards the head of the ETL business, and the expansion operator sinks towards the tail of the ETL business. The contraction operator is an operator with reduced data volume after the operator processing, and the expansion operator is an operator with increased data volume after the operator processing.
In one example, the target ETL service includes a common operator, a cache operator and at least two strings of branch operators, the cache operator is located between the common operator and the at least two strings of branch operators, and the cache operator is configured to cache data output by the common operator and provide the data to the first string of branch operators. The common operator comprises a read operator, the at least two strings of branch operators comprise a first string of branch operators, the first string of branch operators comprises a write operator of the first ETL service, the read operator is used for reading data, and the write operator is used for writing data.
It is to be understood that, for some or all of the descriptions in the method provided by the present disclosure, reference may be made to the above descriptions, and thus, detailed descriptions thereof are omitted.
Based on the method in the foregoing embodiment, the present application provides an ETL service optimization apparatus. Referring to fig. 12, fig. 12 is a schematic structural diagram of an ETL service optimization apparatus according to an embodiment of the present disclosure. As shown in fig. 12, the ETL service optimizing apparatus 1200 includes: a receiving module 1201 and a processing module 1202. The receiving module 1201 is configured to receive a merging request issued by a user, where the merging request is used to indicate that at least two ETL services are merged, and the at least two ETL services include a first ETL service and a second ETL service. The processing module 1202 is configured to determine whether data sources of the first ETL service and the second ETL service are the same, and if so, merge the first ETL service and the second ETL service based on a preconfigured operator merging rule to obtain a target ETL service, where the target ETL service is a service for reading data once and writing data multiple times, and the multiple write data in the target ETL service includes write data of the first ETL service and write data of the second ETL service.
In one example, the processing module 1202 is further configured to: determining an operator depth of each operator in the first ETL service, wherein the operator depth is used for representing an interval between the first operator and a first reading operator in the first ETL service, the first reading operator is used for reading data, each operator comprises the first operator, and the first operator comprises the first reading operator; and combining a first operator in the first ETL service and a second operator in the second ETL service according to the operator depth order and the operator combination rule from small to large to obtain the target operator, wherein the operator depths of the first operator and the second operator are the same.
In one example, the processing module 1202 is further configured to: and when the target operator comprises a common operator, combining a third operator in the first ETL business and a fourth operator in the second ETL business based on an operator combining rule, wherein the third operator is adjacent to the first operator, and the fourth operator is adjacent to the second operator, and the common operator is an operator corresponding to the first ETL business and the second ETL business in the target ETL business.
In one example, the processing module 1202 is further configured to: when the target operator comprises a common operator, a first branch operator and a second branch operator, exchanging the order of the first target operator and the second target operator based on a preconfigured operator exchange rule, wherein the common operator is an operator corresponding to a first ETL service and a second ETL service in a target ETL service, the first branch operator is an operator corresponding to the first ETL service in the target ETL service, the second branch operator is an operator corresponding to the second ETL service in the target ETL service, the first target operator comprises at least one of the first branch operator and the second branch operator, the second target operator is a third operator in the first ETL service or a fourth operator in the second ETL service, the third operator is adjacent to the first operator, the fourth operator is adjacent to the second operator, and the first target operator and the second target operator correspond to the same ETL service; and after the orders of the first target operator and the second target operator are exchanged, combining the third operator and the fourth operator based on an operator combination rule.
In one example, the processing module 1202 is further configured to: when the first operator and the second operator cannot be combined, or the first target operator and the second target operator cannot exchange orders, or the target operation times are greater than a preset time threshold, the target operation times comprise at least one of the combination times and the exchange order times of the operators of the same type, and the combination of the first ETL service and the second ETL service is finished; combining the obtained target operator with an operator which is not combined in the first ETL service according to the depth of the operator to obtain a first branch ETL service, and combining the obtained target operator with an operator which is not combined in the second ETL service to obtain a second branch ETL service, wherein the target ETL service comprises the first branch ETL service and the second branch ETL service.
In one example, the processing module 1202 is further configured to: when a target operator is obtained, determining the execution cost of a third ETL service, wherein the third ETL service is composed of the obtained target operator and an operator which is not combined in the first ETL service or the second ETL service;
and determining the target ETL service according to the execution cost of the third ETL service.
In one example, the processing module 1202 is further configured to: when the third ETL service is multiple, if the execution cost of the fourth ETL service is less than or equal to the sum of the execution costs of the first ETL service and the second ETL service, the fourth ETL service is used as the target ETL service, and the fourth ETL service is the third ETL service with the minimum execution cost.
In one example, the processing module 1202 is further configured to: and setting a cache operator after a common operator contained in a third target operator in the fourth ETL service to obtain the target ETL service, wherein the third target operator is the target operator with the maximum operator depth in the fourth ETL service, and the cache operator is used for caching data output by the common operator and providing data for a branch operator behind the common operator.
In one example, the processing module 1202 is further configured to: after the target ETL service is obtained, the target ETL service is corrected based on a sink-up and float rule, wherein the sink-up and float rule is as follows: the contraction operator floats upwards towards the head of the ETL service, and the expansion operator sinks towards the tail of the ETL service; the contraction operator is an operator with reduced data volume after the operator processing, and the expansion operator is an operator with increased data volume after the operator processing.
In one example, the target ETL service includes a common operator, a cache operator and at least two strings of branch operators, the cache operator is located between the common operator and the at least two strings of branch operators, and the cache operator is configured to cache data output by the common operator and provide the data to the first string of branch operators; the common operator comprises a read operator, the at least two strings of branch operators comprise a first string of branch operators, the first string of branch operators comprises a write operator of the first ETL service, the read operator is used for reading data, and the write operator is used for writing data.
It should be understood that, the above-mentioned apparatus is used for executing the method in the above-mentioned embodiments, and the implementation principle and technical effect of the corresponding program module in the apparatus are similar to those described in the above-mentioned method, and the working process of the apparatus may refer to the corresponding process in the above-mentioned method, and is not described herein again.
Based on the method in the foregoing embodiment, an embodiment of the present application provides an electronic device. Referring to fig. 13, fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 13, an electronic device provided in this embodiment of the application may be used to implement the method described in the foregoing method embodiment.
The electronic device includes at least one processor 1301, and the at least one processor 1301 may support the electronic device to implement the method provided in the embodiments of the present application.
The processor 1301 may be a general purpose processor or a special purpose processor. For example, processor 1301 may include a Central Processing Unit (CPU) and/or a baseband processor. The baseband processor may be configured to process communication data (e.g., determine a target screen terminal), and the CPU may be configured to implement corresponding control and processing functions, execute software programs, and process data of the software programs.
Further, the electronic device may further include a transceiving unit 1305 to implement input (reception) and output (transmission) of signals. For example, the transceiving unit 1305 may include a transceiver or a radio frequency chip. The transceiving unit 1305 may also comprise a communication interface.
Optionally, the electronic device may further include an antenna 1306, which may be used to support the transceiving unit 1305 to implement transceiving functions of the electronic device.
Optionally, the electronic device may include one or more memories 1302, on which programs (also instructions or codes) 1304 are stored, and the programs 1304 may be executed by the processor 1301, so that the processor 1301 executes the methods described in the above method embodiments. Optionally, data may also be stored in the memory 1302. Alternatively, the processor 1301 may also read data (e.g., pre-stored first characteristic information) stored in the memory 1302, where the data may be stored at the same memory address as the program 1304, and the data may also be stored at a different memory address from the program 1304.
The processor 1301 and the memory 1302 may be provided separately or integrated together, for example, on a single board or a System On Chip (SOC).
For a detailed description of operations performed by the electronic device in the above various possible designs, reference may be made to the description in the embodiments of the method provided in the embodiments of the present application, and thus, a detailed description is omitted here.
Based on the method in the embodiment, the embodiment of the application also provides a chip. Referring to fig. 14, fig. 14 is a schematic structural diagram of a chip according to an embodiment of the present disclosure. As shown in fig. 14, chip 1400 includes one or more processors 1401 and interface circuits 1402. Optionally, chip 1400 may also contain bus 1403. Wherein:
processor 1401 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by instructions in the form of hardware integrated logic circuits or software in the processor 1401. The processor 1401 as described above may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The methods, steps disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The interface circuit 1402 may be used to transmit or receive data, instructions, or information, and the processor 1401 may perform processing using the data, instructions, or other information received by the interface circuit 1402, and may transmit processing completion information through the interface circuit 1402.
Optionally, the chip further comprises a memory, which may include read only memory and random access memory, and provides operating instructions and data to the processor. The portion of memory may also include non-volatile random access memory (NVRAM).
Optionally, the memory stores executable software modules or data structures, and the processor may perform corresponding operations by calling the operation instructions stored in the memory (the operation instructions may be stored in an operating system).
Optionally, interface circuit 1402 may be used to output the results of execution by processor 1401.
It should be noted that the functions of the processor 1401 and the interface circuit 1402 may be implemented by hardware design, software design, or a combination of hardware and software, which is not limited herein.
It will be appreciated that the steps of the above-described method embodiments may be performed by logic circuits in the form of hardware or instructions in the form of software in a processor.
It is understood that the processor in the embodiments of the present application may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. The general purpose processor may be a microprocessor, but may be any conventional processor.
The method steps in the embodiments of the present application may be implemented by hardware, or may be implemented by software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in Random Access Memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable hard disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is to be understood that the various numerical references referred to in the embodiments of the present application are merely for descriptive convenience and are not intended to limit the scope of the embodiments of the present application.
Claims (24)
1. A method for optimizing ETL (extract, load) service of a data warehouse technology is characterized by comprising the following steps:
receiving a merging request issued by a user, wherein the merging request is used for indicating merging of at least two ETL services, and the at least two ETL services comprise a first ETL service and a second ETL service;
determining whether data sources of the first ETL service and the second ETL service are the same, if so, combining the first ETL service and the second ETL service based on a preconfigured operator combination rule to obtain a target ETL service, wherein the target ETL service is a service for reading data once and writing data many times, and the writing data many times in the target ETL service comprises the writing data of the first ETL service and the writing data of the second ETL service.
2. The method of claim 1, wherein merging the first ETL traffic and the second ETL traffic based on a preconfigured operator merge rule comprises:
determining an operator depth of each operator in the first ETL service, wherein the operator depth is used for representing an interval between a first operator and a first reading operator in the first ETL service, the first reading operator is used for reading data, each operator comprises the first operator, and the first operator comprises the first reading operator;
and combining a first operator in the first ETL service and a second operator in the second ETL service according to the operator depth sequence and the operator combination rule from small to large to obtain a target operator, wherein the operator depths of the first operator and the second operator are the same.
3. The method of claim 2, further comprising:
when the target operator comprises a common operator, combining a third operator in the first ETL business and a fourth operator in the second ETL business based on the operator combining rule, wherein the third operator is adjacent to the first operator, and the fourth operator is adjacent to the second operator, and the common operator is an operator corresponding to the first ETL business and the second ETL business in the target ETL business.
4. The method of claim 2, further comprising:
when the target operators comprise a common operator, a first branch operator and a second branch operator, exchanging the order of the first target operator and the second target operator based on a preconfigured operator exchange rule, wherein the common operator is an operator corresponding to the first ETL service and the second ETL service in the target ETL service, the first branch operator is an operator corresponding to the first ETL service in the target ETL service, the second branch operator is an operator corresponding to the second ETL service in the target ETL service, the first target operator comprises at least one of the first branch operator and the second branch operator, the second target operator is a third operator in the first ETL service or a fourth operator in the second ETL service, the third operator is adjacent to the first operator, and the fourth operator is adjacent to the second operator, the first target operator and the second target operator correspond to the same ETL service;
merging the third operator and the fourth operator based on the operator merging rule after exchanging the order of the first target operator and the second target operator.
5. The method of any of claims 2-4, further comprising:
when the first operator and the second operator cannot be merged, or the first target operator and the second target operator cannot exchange orders, or the target operation times are greater than a preset time threshold, the target operation times comprise at least one of the merging times and the exchange order times of operators of the same type, and merging of the first ETL service and the second ETL service is finished;
combining the obtained target operator with an operator which is not combined in the first ETL service according to the depth of the operator to obtain a first branch ETL service, and combining the obtained target operator with an operator which is not combined in the second ETL service to obtain a second branch ETL service, wherein the target ETL service comprises the first branch ETL service and the second branch ETL service.
6. The method of any of claims 2-5, further comprising:
determining the execution cost of a third ETL service when the target operator is obtained, wherein the third ETL service is composed of the obtained target operator and an operator which is not combined in the first ETL service or the second ETL service;
and determining the target ETL service according to the execution cost of the third ETL service.
7. The method of claim 6, wherein the determining the target ETL service according to the execution cost of the third ETL service comprises:
when the third ETL service is multiple, if the execution cost of a fourth ETL service is less than or equal to the sum of the execution costs of the first ETL service and the second ETL service, taking the fourth ETL service as the target ETL service, where the fourth ETL service is the third ETL service with the smallest execution cost.
8. The method of claim 7, wherein the regarding the fourth ETL service as the target ETL service comprises:
and setting a cache operator after a common operator contained in a third target operator in the fourth ETL service to obtain the target ETL service, wherein the third target operator is the target operator with the maximum operator depth in the fourth ETL service, and the cache operator is used for caching data output by the common operator and providing data for a branch operator behind the common operator.
9. The method according to any of claims 1-8, wherein after obtaining the target ETL service, further comprising:
and correcting the target ETL service based on a sink-up and float rule, wherein the sink-up and float rule is as follows: the contraction operator floats upwards towards the head of the ETL service, and the expansion operator sinks towards the tail of the ETL service;
the contraction operator is an operator with reduced data volume after being processed by the operator, and the expansion operator is an operator with increased data volume after being processed by the operator.
10. The method according to any one of claims 1-9, wherein the target ETL service comprises a common operator, a cache operator and at least two strings of branch operators, wherein the cache operator is located between the common operator and the at least two strings of branch operators, and the cache operator is configured to cache data output by the common operator and provide data to the first string of branch operators;
the common operator comprises a read operator, the at least two strings of branch operators comprise a first string of branch operators, the first string of branch operators comprises a write operator of the first ETL service, the read operator is used for reading data, and the write operator is used for writing data.
11. An apparatus for optimizing ETL (extract-transform-load) services in a data warehouse technology, the apparatus comprising:
a receiving module, configured to receive a merge request issued by a user, where the merge request is used to instruct to merge at least two ETL services, and the at least two ETL services include a first ETL service and a second ETL service;
and the processing module is configured to determine whether data sources of the first ETL service and the second ETL service are the same, and if so, merge the first ETL service and the second ETL service based on a preconfigured operator merging rule to obtain a target ETL service, where the target ETL service is a service for reading data once and writing data many times, and the data written many times in the target ETL service includes the data written in the first ETL service and the data written in the second ETL service.
12. The apparatus of claim 11, wherein the processing module is further configured to:
determining an operator depth of each operator in the first ETL business, wherein the operator depth is used for representing an interval between a first operator and a first reading operator in the first ETL business, the first reading operator is used for reading data, each operator comprises the first operator, and the first operator comprises the first reading operator;
and combining a first operator in the first ETL service and a second operator in the second ETL service according to the operator depth sequence and the operator combination rule from small to large to obtain a target operator, wherein the operator depths of the first operator and the second operator are the same.
13. The apparatus of claim 12, wherein the processing module is further configured to:
when the target operator comprises a common operator, combining a third operator in the first ETL business and a fourth operator in the second ETL business based on the operator combining rule, wherein the third operator is adjacent to the first operator, and the fourth operator is adjacent to the second operator, and the common operator is an operator corresponding to the first ETL business and the second ETL business in the target ETL business.
14. The apparatus of claim 12, wherein the processing module is further configured to:
when the target operators comprise a common operator, a first branch operator and a second branch operator, exchanging the order of the first target operator and the second target operator based on a preconfigured operator exchange rule, wherein the common operator is an operator corresponding to the first ETL service and the second ETL service in the target ETL service, the first branch operator is an operator corresponding to the first ETL service in the target ETL service, the second branch operator is an operator corresponding to the second ETL service in the target ETL service, the first target operator comprises at least one of the first branch operator and the second branch operator, the second target operator is a third operator in the first ETL service or a fourth operator in the second ETL service, the third operator is adjacent to the first operator, and the fourth operator is adjacent to the second operator, the first target operator and the second target operator correspond to the same ETL service;
merging the third operator and the fourth operator based on the operator merging rule after exchanging the order of the first target operator and the second target operator.
15. The apparatus according to any of claims 12-14, wherein the processing module is further configured to:
when the first operator and the second operator cannot be merged, or the first target operator and the second target operator cannot exchange orders, or the target operation times are greater than a preset time threshold, the target operation times comprise at least one of the merging times and the exchange order times of operators of the same type, and merging of the first ETL service and the second ETL service is finished;
combining the obtained target operator with an operator which is not combined in the first ETL service according to the depth of the operator to obtain a first branch ETL service, and combining the obtained target operator with an operator which is not combined in the second ETL service to obtain a second branch ETL service, wherein the target ETL service comprises the first branch ETL service and the second branch ETL service.
16. The apparatus according to any of claims 12-15, wherein the processing module is further configured to:
determining the execution cost of a third ETL service when the target operator is obtained, wherein the third ETL service is composed of the obtained target operator and an operator which is not combined in the first ETL service or the second ETL service;
and determining the target ETL service according to the execution cost of the third ETL service.
17. The apparatus of claim 16, wherein the processing module is further configured to:
when the third ETL service is multiple, if the execution cost of a fourth ETL service is less than or equal to the sum of the execution costs of the first ETL service and the second ETL service, taking the fourth ETL service as the target ETL service, where the fourth ETL service is the third ETL service with the smallest execution cost.
18. The apparatus of claim 17, wherein the processing module is further configured to:
and setting a cache operator after a common operator contained in a third target operator in the fourth ETL service to obtain the target ETL service, wherein the third target operator is the target operator with the maximum operator depth in the fourth ETL service, and the cache operator is used for caching data output by the common operator and providing data for a branch operator behind the common operator.
19. The apparatus according to any of claims 11-18, wherein the processing module is further configured to: after the target ETL service is obtained, modifying the target ETL service based on a sink-and-float rule, wherein the sink-and-float rule is as follows: the contraction operator floats upwards towards the head of the ETL service, and the expansion operator sinks towards the tail of the ETL service;
the contraction operator is an operator with reduced data volume after being processed by the operator, and the expansion operator is an operator with increased data volume after being processed by the operator.
20. The apparatus according to any of claims 11-19, wherein the target ETL service comprises a common operator, a cache operator and at least two strings of branch operators, the cache operator is located between the common operator and the at least two strings of branch operators, the cache operator is configured to cache data output by the common operator and provide data to the first string of branch operators;
the common operator comprises a read operator, the at least two strings of branch operators comprise a first string of branch operators, the first string of branch operators comprises a write operator of the first ETL service, the read operator is used for reading data, and the write operator is used for writing data.
21. An electronic device, comprising:
at least one memory for storing a program;
at least one processor for executing the memory-stored program, the processor being configured to perform the method of any of claims 1-10 when the memory-stored program is executed.
22. A computer storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1-10.
23. A chip comprising at least one processor and an interface;
the interface is used for providing program instructions or data for the at least one processor;
the at least one processor is configured to execute the program line instructions to implement the method of any of claims 1-10.
24. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1-10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110077891.0A CN114860820A (en) | 2021-01-20 | 2021-01-20 | Optimization method and device for technical business of data warehouse and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110077891.0A CN114860820A (en) | 2021-01-20 | 2021-01-20 | Optimization method and device for technical business of data warehouse and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114860820A true CN114860820A (en) | 2022-08-05 |
Family
ID=82623245
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110077891.0A Pending CN114860820A (en) | 2021-01-20 | 2021-01-20 | Optimization method and device for technical business of data warehouse and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114860820A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117472553A (en) * | 2023-12-28 | 2024-01-30 | 中移(苏州)软件技术有限公司 | Workflow processing method, device, processing equipment and readable storage medium |
-
2021
- 2021-01-20 CN CN202110077891.0A patent/CN114860820A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117472553A (en) * | 2023-12-28 | 2024-01-30 | 中移(苏州)软件技术有限公司 | Workflow processing method, device, processing equipment and readable storage medium |
CN117472553B (en) * | 2023-12-28 | 2024-05-03 | 中移(苏州)软件技术有限公司 | Workflow processing method, device, processing equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11785329B2 (en) | Camera switching method for terminal, and terminal | |
CN115866121B (en) | Application interface interaction method, electronic device and computer readable storage medium | |
CN109766036B (en) | Message processing method and electronic equipment | |
WO2020000448A1 (en) | Flexible screen display method and terminal | |
CN117063461A (en) | Image processing method and electronic equipment | |
CN112130714B (en) | Keyword search method capable of learning and electronic equipment | |
CN111669459A (en) | Keyboard display method, electronic device and computer readable storage medium | |
CN111371849A (en) | Data processing method and electronic equipment | |
CN112751954A (en) | Operation prompting method and electronic equipment | |
CN113254409A (en) | File sharing method, system and related equipment | |
CN112543447A (en) | Device discovery method based on address list, audio and video communication method and electronic device | |
WO2022062796A1 (en) | Network control method, apparatus, and electronic device | |
CN113973398A (en) | Wireless network connection method, electronic equipment and chip system | |
CN114466449A (en) | Position feature acquisition method and electronic equipment | |
CN114283195B (en) | Method for generating dynamic image, electronic device and readable storage medium | |
CN109756614A (en) | A kind of method and relevant apparatus showing contact person | |
CN114860820A (en) | Optimization method and device for technical business of data warehouse and electronic equipment | |
CN114064160A (en) | Application icon layout method and related device | |
CN114173286B (en) | Method and device for determining test path, electronic equipment and readable storage medium | |
CN114338642B (en) | File transmission method and electronic equipment | |
CN113950045B (en) | Subscription data downloading method and electronic equipment | |
CN112929854B (en) | Event subscription method and electronic equipment | |
CN115032640A (en) | Gesture recognition method and terminal equipment | |
CN115022982A (en) | Multi-screen cooperative non-inductive access method, electronic equipment and storage medium | |
JP7462659B2 (en) | Information display method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |