WO2023029854A1 - 数据查询方法、装置、存储介质及电子设备 - Google Patents
数据查询方法、装置、存储介质及电子设备 Download PDFInfo
- Publication number
- WO2023029854A1 WO2023029854A1 PCT/CN2022/109468 CN2022109468W WO2023029854A1 WO 2023029854 A1 WO2023029854 A1 WO 2023029854A1 CN 2022109468 W CN2022109468 W CN 2022109468W WO 2023029854 A1 WO2023029854 A1 WO 2023029854A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- statement
- engine
- query
- query language
- structured query
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 82
- 238000004364 calculation method Methods 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims description 40
- 229910021532 Calcite Inorganic materials 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 19
- 238000005457 optimization Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 description 27
- 230000006978 adaptation Effects 0.000 description 19
- 230000006870 function Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 8
- 238000006243 chemical reaction Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
Definitions
- the present disclosure relates to the field of data technology, and in particular, to a data query method, device, storage medium and electronic equipment.
- OLAP Online Analytical Processing, Online Analytical Processing
- the goal of OLAP is to meet the specific query and report requirements of decision support or multi-dimensional environment.
- data may first be obtained from a data source through a computing engine for analysis.
- a computing engine for analysis.
- the architectures of various computing engines are different.
- users are required to master different engine usage skills, including SQL (Structured Query Language, Structured Query Language) syntax, function definition, and parameter tuning.
- SQL Structured Query Language
- function definition for example, when users use the Presto engine, they need to write SQL according to the syntax specification of the Presto engine, while when using the Spark engine, they need to write SQL according to the syntax specification of the Spark engine, which greatly increases the user's cost in the data query process , thus affecting OLAP efficiency.
- the present disclosure provides a data query method, the method comprising:
- the query feature is used to characterize the query semantics of the structured query language statement
- the query feature of the structured query language statement determine a target computing engine among multiple computing engines, and convert the structured query language statement into a target data query statement executable by the target computing engine;
- the target data query statement is executed by the target computing engine.
- the present disclosure provides a data query device, the device comprising:
- An acquisition module configured to acquire the structured query language statement determined based on the unified structured query language standard
- a first determining module configured to determine a query feature corresponding to the structured query language statement, where the query feature is used to characterize the query semantics of the structured query language statement;
- the second determining module is configured to determine a target computing engine among a plurality of computing engines according to the query characteristics of the structured query language statement, and convert the structured query language statement into a target computing engine executable The target data query statement;
- a query module configured to execute the target data query statement through the target computing engine.
- the present disclosure provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the method described in the first aspect are implemented.
- an electronic device including:
- a processing device configured to execute the computer program in the storage device to implement the steps of the method in the first aspect.
- users do not need to write corresponding SQL statements for computing engines of different architectures, but can write structured query language statements based on the unified structured query language standard, and then automatically adapt to the SQL statement based on the query characteristics of the structured query language statement. Configure the target computing engine to perform corresponding data query operations. Thereby, the usage cost of the user in the data query process can be reduced, the efficiency of data query can be improved, and the efficiency of OLAP can be improved in the scene of online analytical processing (OLAP).
- OLAP online analytical processing
- Fig. 1 is a flow chart showing a data query method according to an exemplary embodiment of the present disclosure
- Fig. 2 is a schematic diagram of a processing procedure of a data query method shown according to an exemplary embodiment of the present disclosure
- Fig. 3 is a block diagram of a data query device according to an exemplary embodiment of the present disclosure.
- Fig. 4 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
- the term “comprise” and its variations are open-ended, ie “including but not limited to”.
- the term “based on” is “based at least in part on”.
- the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
- the present disclosure provides a data query method to automatically adapt to various computing engines based on a unified SQL syntax, thereby reducing user costs in the data query process.
- Fig. 1 is a flowchart showing a data query method according to an exemplary embodiment of the present disclosure.
- the data query method includes:
- Step 101 acquiring a structured query language statement written based on a unified structured query language standard.
- Step 102 determine the query feature corresponding to the structured query language statement, and the query feature is used to characterize the query semantics of the structured query language statement.
- Step 103 according to the query features of the structured query language statement, determine a target computing engine among multiple computing engines, and convert the structured query language statement into a target data query statement executable by the target computing engine.
- Step 104 execute the target data query statement through the target computing engine.
- users do not need to write corresponding SQL statements for computing engines of different architectures, but can write structured query language statements based on the unified structured query language standard, and then automatically adapt according to the query characteristics of the structured query language statement
- the target computing engine is used to perform corresponding data query operations.
- the usage cost of the user in the data query process can be reduced, the efficiency of data query can be improved, and the efficiency of OLAP can be improved in the scene of online analytical processing (OLAP).
- SQL ANSI-2011 standard can be selected as the basis, supplemented with part of Hive-style DDL (Data Definition Language, database data schema definition language) and Flink-style stream syntax as the unified structured query in the embodiment of the present disclosure language standards.
- Hive-style DDL Data Definition Language, database data schema definition language
- Flink-style stream syntax as the unified structured query in the embodiment of the present disclosure language standards.
- any SQL syntax supported by any computing engine may also be selected as the unified structured query language standard, which is not limited in this embodiment of the present disclosure.
- the embodiment of the present disclosure pre-sets the unified structured query language standard, so the user can write SQL statements based on the unified structured query language standard during the data query process, without learning the use skills of each computing engine and writing the SQL corresponding to the computing engine statement.
- the embodiment of the present disclosure can also set a unified calling interface, and then obtain the structured query language statement written based on the unified structured query language standard through the unified calling interface. That is to say, the embodiments of the present disclosure provide a unified SQL entry and SQL standard. In this way, the usage cost of the user in the process of data query can be reduced, and the efficiency of data query can be improved.
- a target computing engine that is more suitable for data query can be determined among multiple computing engines.
- the query feature may represent the query semantics of the structured query language statement, and may include the complexity feature of the structured query language statement and/or the data source feature of the data to be queried in the structured query language statement.
- Multiple computing engines may include pure computing big data computing engines: Spark, Hive, Presto, Flink, or may also include computing and storage integrated big data computing engines: ClickHouse, Druid, ElasticSearch, which is not limited in the embodiments of the present disclosure.
- determining the query feature corresponding to the structured query language statement may be: determining the complexity feature corresponding to the structured query language statement and/or the data source feature of the data to be queried in the structured query language statement.
- determining the target computing engine among the multiple computing engines may be: according to the complexity characteristics and/or data source characteristics of the structured query language statement, determining among the multiple computing engines target computing engine.
- a pre-configured engine adaptation rule can be obtained, and the engine adaptation rule is used to characterize the corresponding relationship between the query features of the structured query language statement and multiple computing engines, so that the query can be based on the structured query language statement Feature and engine adaptation rules determine the target computing engine among multiple computing engines.
- pre-configured first engine adaptation rules and second engine adaptation rules can be obtained.
- the first engine adaptation rules are used to characterize the correspondence between the complexity of structured query language statements and multiple computing engines.
- the second engine adaptation rule is used to characterize the corresponding relationship between the data source and multiple computing engines, so that according to the complexity characteristics of the structured query language statement and the first engine adaptation rule, and the structure query language statement.
- the characteristics of the data source and the adaptation rules of the second engine determine the target computing engine among the multiple computing engines.
- the pre-configured first engine adaptation rule is: engine A is used to execute simple SQL statements, and engine B is used to execute complex SQL statements. Therefore, according to the complexity characteristics of the SQL statement, determining the target computing engine can be: if it is determined that the complexity of the SQL statement is greater than the preset complexity, then it can be determined that the target computing engine is the B engine; if it is determined that the complexity of the SQL statement is less than or equal to If the complexity is preset, it can be determined that the target computing engine is the A engine.
- the pre-configured adaptation rule of the second engine is: computing engine A is used for data query of data source a, and computing engine B is used for data query of data source b. Therefore, if it is determined that the data source of the data to be queried in the SQL statement is data source a, then the target computing engine is determined to be the A computing engine; if it is determined that the data source of the data to be queried in the SQL statement is the b data source, then the target computing engine is determined to be BCompute engine.
- the pre-configured first engine adaptation rule is: A1 computing engine executes simple SQL statements, A2 engine executes complex SQL statements, and the second engine adaptation rule is pre-configured: A1 computing engine and A2 computing engine are used for a Data source for data query.
- the target computing engine can be determined to be A1 computing engine and A2 computing engine based on the characteristics of the data source and the second engine adaptation rule.
- a target computing engine may be determined on the A1 computing engine and the A2 computing engine according to the complexity characteristics of the SQL statement and the first engine adaptation rule.
- the target computing engine can be determined according to the query characteristics of the SQL statement in other ways.
- the embodiment of the present disclosure does not limit this.
- the structured query language statement can be optimized first according to the query characteristics of the structured query language statement and the preset statement optimization strategy to obtain the optimized query statement, and then according to the optimized A query statement to determine the target computing engine among multiple computing engines.
- the preset statement optimization strategy may include general optimization methods in related technologies, such as materialized view selection strategy, expression merging strategy, advanced constant deduction strategy, and built-in function optimization strategy (that is, a strategy for converting inefficient functions into efficient functions) etc., which are not limited in the embodiments of the present disclosure.
- the SQL statement is used to query the data of tables A, B, and C, including the expression "A join B join C", then according to the query characteristics of the SQL statement, it can be determined that an operation needs to be performed: A join B join C.
- a join B join C In this case, whether to perform operation A join B and then perform operation join C, or perform operation B join C and then perform operation join A, or perform operation A join C and then perform operation join B, which can improve the performance of SQL statements.
- Execution efficiency is the goal, determined according to the preset statement optimization strategy, that is, the SQL statement can be optimized through the expression combination strategy to obtain the optimized query statement.
- the target computing engine may be determined among multiple computing engines according to the optimized query statement.
- the SQL statements can be uniformly optimized, thereby reducing the user's usage cost in the data query process, and further improving the data query efficiency.
- the acquired SQL statement can be converted into a target data query statement that the target computing engine can execute.
- the UDF User Defined Function, user-defined function
- UDAF User Defined Aggregation Function, user-defined aggregation function
- one of the multiple computing engines can be selected as the standard engine first, and based on the data format of the structured query language statement that the standard engine can process, the structured query language statement is first converted into an intermediate query statement.
- converting the structured query language statement into the target data query statement that the target computing engine can execute may be: determining whether the target computing engine is a standard engine, and if the target computing engine is not a standard engine, converting the intermediate query statement into the target computing engine The target data query statement that the engine can execute.
- the intermediate query statement may be directly executed through the target computing engine.
- the standard engine may be a computing engine with universal SQL statement rules and relatively low conversion costs for converting to SQL statements executable by other engines.
- the conversion efficiency of the SQL statement can be improved, thereby improving the data query efficiency.
- the Calcite engine can be used as the standard engine.
- any other computing engine may also be selected as the standard engine, for example, the Spark engine or the Presto engine may be selected as the standard engine, which is not limited in this embodiment of the present disclosure.
- converting the structured query language statement into an intermediate query statement can be: Based on the format of the structured query language statement that the standard engine can process, the structured query language statement is converted into a RelNode statement.
- converting the intermediate query statement into a target data query statement executable by the target computing engine may be: converting the RelNode statement into a DataFrame statement executable by the target computing engine.
- the input SQL statement can be converted into a RelNode statement first, and then if the target computing engine is determined to be a Spark engine, the RelNode statement can be further converted into a DataFrame statement, and then execute the DataFrame statement through the Spark engine to realize data query. If it is determined that the target calculation engine is the Calcite engine, the RelNode statement can be directly executed through the Calcite engine.
- the API of the DataFrame statement is relatively stable, which can ensure the stability of the corresponding statement through the API call, thereby ensuring the normal execution of the data query .
- two sets of SQL parsing need to be maintained.
- the parsing time Both overhead and resource overhead are large, which will affect the efficiency of data query. Therefore, converting the RelNode statement into a DataFrame statement in the embodiment of the present disclosure can further improve the efficiency of data query in the scenario of adapting to multiple engines.
- converting the structured query language statement into an intermediate query statement can be: based on The standard engine can process the data format of the structured query language statement, and convert the structured query language statement into a RelNode statement.
- converting the intermediate query statement into a target data query statement executable by the target computing engine may be: converting the RelNode statement into a structured query language statement executable by the target computing engine.
- the Calcite engine does not have native support for the Presto engine, and the Presto engine does not have a stable API interface.
- a possible implementation is to first convert the SQL statement into a RelNode statement that the Calcite engine can process, and then convert the RelNode statement into a Node structure that the Presto engine can process.
- the Presto engine lacks an efficient API, the docking cost of this method is very high and it is difficult to apply it quickly.
- the embodiment of the present disclosure can first convert the SQL statement into a RelNode statement that the Calcite engine can process, and then convert the RelNode statement into a SQL statement that the Presto engine can execute. statement. Therefore, although two sets of SQL parsing need to be maintained, since the SQL parsing of the Presto engine and the Calcite engine are relatively similar, the time and resource overhead of parsing can be reduced and the efficiency of data query can be guaranteed.
- the automatic adaptation of multiple engines can be realized based on the unified SQL standard, and the cost of using the user in the data query process can be reduced, thereby improving the efficiency of data query.
- the data to be queried spans multiple data sources, it is necessary to first synchronize the data of other data sources to the target data source, and then perform data query on the target data source after data synchronization.
- the target data source is any data source in the multiple data sources, and other data sources are the remaining data sources in the multiple data sources except the target data source.
- data synchronization tasks need to be performed when performing joint queries across data sources. If there is a lot of data to be synchronized, the efficiency of data query will be greatly affected.
- the data source of the data to be queried in the structured query language statement may include at least two different In the case of the data source, according to the data source characteristics corresponding to the structured query language statement, determine the target computing engine corresponding to each data source among multiple computing engines, and based on the data source characteristics, the structured query language statement It is converted into a target data query statement that the corresponding target computing engine can execute, and then after executing the corresponding target data query statement through the target computing engine, any engine among multiple computing engines is determined as a joint processing engine, and each target The calculation engine sends the target data queried from the corresponding data source to the joint processing engine by executing the target data query statement, and finally performs joint processing on each target data through the joint processing engine.
- the joint processing engine may be any one of the target computing engines, or any other engine among the multiple computing engines except the target computing engine, and may be configured according to actual conditions. It should be understood that determining any one of the target computing engines as a joint processing engine can reduce Data transmission, which can improve the efficiency of data query.
- the user triggers a data query operation through a BI (Business Intelligence, business intelligence) tool, and generates an SQL statement corresponding to the data query operation based on a unified structured query language standard.
- the SQL statement can be sent to the federated query layer of the database through JDBC (Java Database Connectivity, Java database connection) or REST interface.
- the engine adaptation module of the federated query layer can determine the target computing engine corresponding to each data source among the multiple computing engines preset in the engine layer according to the characteristics of the data source corresponding to the SQL statement, and based on the characteristics of the data source, the The SQL statement is converted into a target data query statement that the corresponding target computing engine can execute.
- the target data query statement can be sent to the engine layer, and data query can be performed on the corresponding data source of the data source layer through the target computing engine in the engine layer.
- the target computing engine can send the target data queried from the data source to any computing engine in the engine layer (that is, the joint processing engine), so that the cross-data source joint processing can be performed through the computing engine, that is, through a certain computing
- the engine associates the target data queried by each target computing engine and returns it to the user.
- cross-data source joint query can be supported, and additional data synchronization tasks can be omitted, thereby improving data query efficiency.
- the federated query layer may also include an optimization module, a management module, a permission module and a metadata module.
- the optimization module can uniformly optimize the structured query language statement according to the query characteristics of the structured query language statement and a preset statement optimization strategy.
- the management module can manage the process of submitting target data query statements to the target computing engine during the data query process, or can also perform processes such as log collection and result saving.
- the authority module can first determine whether the originating user of the SQL statement has the authority to execute the data operation authority corresponding to the SQL statement or can verify the correctness of the SQL statement.
- the metadata module is used to store the data permission information of each user and the metadata corresponding to each data source, so as to better determine the target computing engine.
- the specific implementation methods of the management module, the authority module and the metadata module in the federated query layer are similar to related technologies, and will not be repeated here.
- the multiple computing engines preset at the engine layer include Spark, Hive, Presto, Flink, ClickHouse, and ElasticSearch.
- the data source layer includes HDFS, RDS, Kafka, ClickHouse, and ElasticSearch. It should be understood that ClickHouse and ElasticSearch are computing and storage integrated engines, so they can be included in the engine layer and data source layer.
- the data query process may be: obtaining an SQL statement determined based on a unified structured query language standard through a JDBC or REST interface. Then, verify the correctness of the SQL statement through the metadata module of the federated query layer, and determine whether the user has the authority to query the data to be queried in the SQL statement through the authority module. Afterwards, the SQL statement can be uniformly optimized through the optimization module of the federated query layer to obtain an optimized query statement.
- OLAP online analytical processing
- the engine adaptation module of the federated query layer can determine the target computing engine according to the query characteristics of the SQL statement, and convert the SQL statement into a target data query statement that the target computing engine can execute and send it to the target computing engine in the engine layer.
- the target computing engine can determine the execution strategy for the target data query statement according to its own optimization capability of the physical layer, and execute the target data query statement according to the execution strategy to query data from the corresponding data source. If there are multiple data sources, after the data is queried, the queried data can be returned to each target calculation, and finally the federated query layer performs joint processing.
- the present disclosure also provides a data query device, which can become a part or all of the electronic equipment through software, hardware or a combination of both.
- the data query device 300 may include:
- An acquisition module 301 configured to acquire a structured query language statement determined based on a unified structured query language standard
- the first determination module 302 is configured to determine query features corresponding to the structured query language statement, where the query feature is used to characterize the query semantics of the structured query language statement;
- the second determining module 303 is configured to determine a target computing engine among multiple computing engines according to the query features of the structured query language statement, and convert the structured query language statement into a target computing engine capable of The executed target data query statement;
- the query module 304 is configured to execute the target data query statement through the target calculation engine.
- the first determining module 302 is configured to:
- the second determination module 303 is used for:
- a target computing engine is determined among multiple computing engines.
- the data source of the data to be queried in the structured query language statement includes at least two different data sources, and the second determining module 303 is used for:
- the data source characteristics corresponding to the structured query language statement determine a target computing engine corresponding to each of the data sources among multiple computing engines, and convert the structured query language statement based on the data source characteristics A target data query statement that can be executed by the corresponding target computing engine;
- the device 300 also includes:
- a joint module configured to determine any one of the multiple computing engines as a joint processing engine after executing the target data query statement through the target computing engine, and execute each of the target computing engines by executing the The target data query statement from the corresponding data source sends the target data to the joint processing engine, and each target data is jointly processed by the joint processing engine.
- the second determining module 303 is configured to:
- a target computing engine is determined among multiple computing engines.
- the device 300 also includes:
- An intermediate conversion module configured to select an engine among the plurality of computing engines as a standard engine, and convert the structured query language statement into an intermediate query based on the format of the structured query language statement that the standard engine can process statement;
- the second determination module 303 is used for:
- target computing engine is not the standard engine, converting the intermediate query statement into a target data query statement executable by the target computing engine.
- the standard engine is a Calcite engine
- the target computing engine is a Spark engine
- the intermediate conversion module is used to convert the structured Query language statements are converted to RelNode statements
- the second determining module 303 is used for converting the RelNode statement into a DataFrame statement executable by the target computing engine.
- the standard engine is a Calcite engine
- the target calculation engine is a Presto engine
- the intermediate conversion module is used to convert the structured query language statement based on the data format of the standard engine to process Query language statements are converted to RelNode statements;
- the second determining module 303 is used to convert the RelNode statement into a structured query language statement executable by the target computing engine.
- the present disclosure also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the program is executed by a processing device, the steps of any one of the above data query methods are implemented.
- an electronic device including:
- a processing device configured to execute the computer program in the storage device, so as to realize the steps of any data query method described above.
- FIG. 4 it shows a schematic structural diagram of an electronic device 400 suitable for implementing an embodiment of the present disclosure.
- the terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
- the electronic device shown in FIG. 4 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
- an electronic device 400 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 401, which may be randomly accessed according to a program stored in a read-only memory (ROM) 402 or loaded from a storage device 408.
- a processing device such as a central processing unit, a graphics processing unit, etc.
- RAM read-only memory
- various appropriate actions and processes are executed by programs in the memory (RAM) 403 .
- RAM 403 In the RAM 403, various programs and data necessary for the operation of the electronic device 400 are also stored.
- the processing device 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404.
- An input/output (I/O) interface 405 is also connected to bus 404 .
- the following devices can be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 407 such as a computer; a storage device 408 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 409.
- the communication means 409 may allow the electronic device 400 to perform wireless or wired communication with other devices to exchange data. While FIG. 4 shows electronic device 400 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
- embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
- the computer program may be downloaded and installed from a network via communication means 409, or from storage means 408, or from ROM 402.
- the processing device 401 When the computer program is executed by the processing device 401, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
- the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
- a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
- a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
- Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
- any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) can be used to communicate, and can communicate with digital data in any form or medium (for example, communication network) interconnection.
- Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
- the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
- the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains the structured query language statement determined based on the unified structured query language standard; determines the The query feature corresponding to the structured query language statement, the query feature is used to characterize the query semantics of the structured query language statement; according to the query feature of the structured query language statement, it is determined in multiple computing engines a target computing engine, and convert the structured query language statement into a target data query statement executable by the target computing engine; execute the target data query statement through the target computing engine.
- Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as "C" or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, using an Internet service provider to connected via the Internet).
- LAN local area network
- WAN wide area network
- Internet service provider for example, using an Internet service provider to connected via the Internet.
- each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
- modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not constitute a limitation on the module itself under certain circumstances.
- FPGAs Field Programmable Gate Arrays
- ASICs Application Specific Integrated Circuits
- ASSPs Application Specific Standard Products
- SOCs System on Chips
- CPLD Complex Programmable Logical device
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
- a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read only memory
- EPROM or flash memory erasable programmable read only memory
- CD-ROM compact disk read only memory
- magnetic storage or any suitable combination of the foregoing.
- Example 1 provides a data query method, the method comprising:
- the query feature is used to characterize the query semantics of the structured query language statement
- the query feature of the structured query language statement determine a target computing engine among multiple computing engines, and convert the structured query language statement into a target data query statement executable by the target computing engine;
- the target data query statement is executed by the target computing engine.
- Example 2 provides the method of Example 1, and the determining the query features corresponding to the structured query language statement includes:
- determining a target computing engine among multiple computing engines includes:
- a target computing engine is determined among multiple computing engines.
- Example 3 provides the method of Example 1, the data source of the data to be queried in the structured query language statement includes at least two different data sources, and the structured Querying the query features of a language statement, determining a target computing engine among multiple computing engines, and converting the structured query language statement into a target data query statement that can be executed by the target computing engine, including:
- the data source characteristics corresponding to the structured query language statement determine a target computing engine corresponding to each of the data sources among multiple computing engines, and convert the structured query language statement based on the data source characteristics A target data query statement that can be executed by the corresponding target computing engine;
- the target data query statement After the target data query statement is executed by the target calculation engine, it also includes:
- Example 4 provides the method described in any one of Examples 1-3, wherein the target is determined in multiple computing engines according to the query characteristics of the structured query language statement Calculation engine, including:
- a target computing engine is determined among multiple computing engines.
- Example 5 provides the method described in any one of Examples 1-3, the method further comprising:
- the converting the structured query language statement into a target data query statement executable by the target computing engine includes:
- target computing engine is not the standard engine, converting the intermediate query statement into a target data query statement executable by the target computing engine.
- Example 6 provides the method described in Example 5, the standard engine is a Calcite engine, the target calculation engine is a Spark engine, and the structure that can be processed based on the standard engine is the format of the structured query language statement, and convert the structured query language statement into an intermediate query statement, including:
- the converting the intermediate query statement into a target data query statement executable by the target computing engine includes:
- Example 7 provides the method described in Example 5, the standard engine is a Calcite engine, the target calculation engine is a Presto engine, and the structure that can be processed based on the standard engine the format of the structured query language statement, and convert the structured query language statement into an intermediate query statement, including:
- the converting the intermediate query statement into a target data query statement executable by the target computing engine includes:
- Example 8 provides a data query device, the device comprising:
- An acquisition module configured to acquire the structured query language statement determined based on the unified structured query language standard
- a first determining module configured to determine a query feature corresponding to the structured query language statement, where the query feature is used to characterize the query semantics of the structured query language statement;
- the second determining module is configured to determine a target computing engine among a plurality of computing engines according to the query characteristics of the structured query language statement, and convert the structured query language statement into a target computing engine executable The target data query statement;
- a query module configured to execute the target data query statement through the target computing engine.
- Example 9 provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the program is executed by a processing device, any one of Examples 1-7 is implemented. steps of the method described above.
- Example 10 provides an electronic device, comprising:
- a processing device configured to execute the computer program in the storage device to implement the steps of any one of the methods in Examples 1-7.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本公开涉及一种数据查询方法、装置、存储介质及电子设备,以减少用户在数据查询过程中的使用成本,提高数据查询效率。该方法包括:获取基于统一结构化查询语言标准确定的结构化查询语言语句;确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;通过所述目标计算引擎执行所述目标数据查询语句。
Description
相关申请的交叉引用
本申请基于申请号为202111032755.6、申请日为2021年09月03日,名称为“数据查询方法、装置、存储介质及电子设备”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
本公开涉及数据技术领域,具体地,涉及一种数据查询方法、装置、存储介质及电子设备。
OLAP(Online Analytical Processing,联机分析处理)是针对特定问题的联机数据访问和分析,OLAP的目标是满足决策支持或多维环境特定的查询和报表需求。
相关技术中,在OLAP过程中,可以先通过计算引擎从数据源获取数据进行分析。但是,各种计算引擎的架构不同,在使用多种异构计算引擎时,要求用户掌握不同的引擎使用技巧,包括SQL(Structured Query Language,结构化查询语言)语法、函数定义和参数调优等,例如,用户使用Presto引擎时,需要按照Presto引擎的语法规范编写SQL,而使用Spark引擎时,则需要按照Spark引擎的语法规范编写SQL,极大的增加了用户在数据查询过程中的使用成本,从而影响OLAP效率。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
第一方面,本公开提供一种数据查询方法,所述方法包括:
获取基于统一结构化查询语言标准确定的结构化查询语言语句;
确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;
根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;
通过所述目标计算引擎执行所述目标数据查询语句。
第二方面,本公开提供一种数据查询装置,所述装置包括:
获取模块,用于获取基于统一结构化查询语言标准确定的结构化查询语言语句;
第一确定模块,用于确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;
第二确定模块,用于根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;
查询模块,用于通过所述目标计算引擎执行所述目标数据查询语句。
第三方面,本公开提供一种非临时性计算机可读存储介质,其上存储有计算机程序,该程序被处理装置执行时实现第一方面中所述方法的步骤。
第四方面,本公开提供一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现第一方面中所述方法的步骤。
通过上述技术方案,用户无需针对不同架构的计算引擎编写对应的SQL语句,而是可以基于统一结构化查询语言标准编写结构化查询语言语句,然后根据该结构化查询语言语句的查询特征,自动适配目标计算引擎来执行对应的数据查询操作。由此,可以减少用户在数据查询过程中的使用成本,提高数据查询效率,在联机分析处理(OLAP)的场景下,可以提高联机分析处理的效率。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。在附图中:
图1是根据本公开一示例性实施例示出的一种数据查询方法的流程图;
图2是根据本公开一示例性实施例示出的一种数据查询方法的处理过程示意图;
图3是根据本公开一示例性实施例示出的一种数据查询装置的框图;
图4是根据本公开一示例性实施例示出的一种电子设备的框图。
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。另外需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
正如背景技术所言,相关技术在OLAP过程中,可以先通过大数据计算引擎从数据源获取数据进行分析。但是,各种计算引擎的架构不同,在使用多种异构计算引擎时,要求用户掌握不同的引擎使用技巧,包括SQL(Structured Query Language,结构化查询语言)语法、函数定义和参数调优等,例如,用户使用Presto引擎时,需要按照Presto引擎的语法规范编写SQL,而使用Spark引擎时,则需要按照Spark引擎的语法规范编写SQL,极大的增加了用户在数据查询过程中的使用成本,从而影响OLAP效率。
有鉴于此,本公开提供一种数据查询方法,以基于统一的SQL语法自动适配各种计算引擎,减少用户在数据查询过程中的使用成本。
图1是根据本公开一示例性实施例示出的一种数据查询方法的流程图。参照图1,该数据查询方法包括:
步骤101,获取基于统一结构化查询语言标准编写的结构化查询语言语句。
步骤102,确定结构化查询语言语句对应的查询特征,该查询特征用于表征结构化查询语言语句的查询语义。
步骤103,根据结构化查询语言语句的查询特征,在多个计算引擎中确定目标计算引擎,并将结构化查询语言语句转换为目标计算引擎能够执行的目标数据查询语句。
步骤104,通过目标计算引擎执行目标数据查询语句。
通过上述方式,用户无需针对不同架构的计算引擎编写对应的SQL语句,而是可以基于统一结构化查询语言标准编写结构化查询语言语句,然后根据该结构化查询语言语句的查询特征,自动适配目标计算引擎来执行对应的数据查询操作。由此,可以减少用户在数据查询过程中的使用成本,提高数据查询效率,在联机分析处理(OLAP)的场景下,可以提高联机分析处理的效率。
为了使得本领域技术人员更加理解本公开提供的数据查询方法,下面对上述各步骤进行详细举例说明。
示例地,可以选择以SQL ANSI-2011标准作为基础,辅以部分Hive风格的DDL(Data Definition Language,库数据模式定义语言)和Flink风格的流式语法作为本公开实施例中的统一结构化查询语言标准。当然,在其他可能的方式中,也可以选择任一计算引擎支持的SQL语法作为统一结构化查询语言标准,本公开实施例对此不作限定。
本公开实施例预先设置了统一结构化查询语言标准,因此用户在数据查询过程中可以基于该统一结构化查询语言标准编写SQL语句,无需学习每一计算引擎的使用技巧并编写对应计算引擎的SQL语句。此外,本公开实施例还可以设置统一调用接口,然后通过该统一调用接口获取基于统一结构化查询语言标准编写的结构化查询语言语句。也即是说,本公开实施例提供统一的SQL入口和SQL标准。由此,可以减少用户在数据查询过程中的使用成本,提高数据查询效率。
在获取到基于统一结构化查询语言标准编写的结构化查询语言语句后,为了准确地自动适配计算引擎来执行数据查询,提高数据查询效率,可以先确定结构化查询语言语句的查询特征,然后根据结构化查询语言语句的查询特征,可以在多个计算引擎中确定更适合进行数据查询的目标计算引擎。
示例地,查询特征可以表征结构化查询语言语句的查询语义,可以包括结构化查询语言语句的复杂度特征和/或结构化查询语言语句中待查询数据的数据源特征。多个计算引擎可以包括纯计算类大数据计算引擎:Spark、Hive、Presto、Flink,或者还可以包括计算存储一体的大数据计算引擎:ClickHouse、Druid、ElasticSearch,本公开实施例对此不作限定。
在可能的方式中,确定结构化查询语言语句对应的查询特征可以是:确定结构化查询语言语句对应的复杂度特征和/或结构化查询语言语句中待查询数据的数据源特征。相应地,根据结构化查询语言语句的查询特征,在多个计算引擎中确定目标计算引擎可以是:根据结构化查询语言语句的复杂度特征和/或数据源特征,在多个计算引擎中确定目标计算引擎。
示例地,可以获取预先配置的引擎适配规则,该引擎适配规则用于表征结构化查询语言语句的查询特征与多个计算引擎之间的对应关系,从而可以根据结构化查询语言语句的查询特征和引擎适配规则,在多个计算引擎中确定目标计算引擎。
比如,可以获取预先配置的第一引擎适配规则和第二引擎适配规则,第一引擎适配规则用于表征结构化查询语言语句的复杂度与多个计算引擎之间的对应关系,第二引擎适配规则用于表征数据源与多个计算引擎之间的对应关系,从而可以根据所述结构化查询语言语句的复杂度特征和第一引擎适配规则、以及结构化查询语言语句的数据源特征和第二引擎适配规则,在多个计算引擎中确定目标计算引擎。
例如,预先配置第一引擎适配规则为:A引擎用于执行简单SQL语句,B引擎用于执行复杂SQL语句。因此,根据SQL语句的复杂度特征,确定目标计算引擎可以是:若确定SQL语句的复杂度大于预设复杂度,则可以确定目标计算引擎为B引擎,若确定SQL语句的复杂度小于或等于预设复杂度,则可以确定目标计算引擎为A引擎。
又例如,预先配置第二引擎适配规则为:A计算引擎用于对a数据源进行数据查询,B计算引擎用于对b数据源进行数据查询。因此,若确定SQL语句中待查询数据的数据源为a数据源,则确定目标计算引擎为A计算引擎,若确定SQL语句中待查询数据的数据源为b数据源,则确定目标计算引擎为B计算引擎。
再例如,预先配置第一引擎适配规则为:A1计算引擎执行简单SQL语句,A2引擎执行复杂SQL语句,且预先配置了第二引擎适配规则:A1计算引擎和A2计算引擎用于对a数据源进行数据查询。在此种情况下,若确定SQL语句中待查询数据的数据源为a数据源,则可以基于该数据源特征和第二引擎适配规则确定目标计算引擎为A1计算引擎 和A2计算引擎。进一步,还可以根据SQL语句的复杂度特征和第一引擎适配规则,在A1计算引擎和A2计算引擎确定一目标计算引擎。
应当理解的是,上述举例仅是根据SQL语句的查询特征确定目标计算引擎的可能方式,并不用于限制本公开,在具体应用中,可以通过其他方式根据SQL语句的查询特征确定目标计算引擎,本公开实施例对此不作限定。
在可能的方式中,为了提高SQL语句的执行效率,可以先根据结构化查询语言语句的查询特征和预设的语句优化策略,对结构化查询语言语句进行优化,得到优化查询语句,然后根据优化查询语句,在多个计算引擎中确定目标计算引擎。
应当理解的是,相关技术中,用户使用异构计算引擎时,不仅需要针对不同的计算引擎编写对应的SQL语句,由于不同计算引擎的执行特性不同,还需要针对不同的计算引擎设置对应的SQL语句优化策略,增加了用户的使用成本,影响数据查询效率。而本公开实施例中,在基于统一SQL标准编写SQL语句后,可以对SQL语句进行统一优化,从而减少用户的使用成本,提高数据查询效率。
示例地,预设的语句优化策略可以包括相关技术中的通用优化手段,比如物化视图选择策略、表达式合并策略、高级常量推断策略、内置函数优化策略(即低效函数转高效函数的策略)等,本公开实施例对此不作限定。
例如,SQL语句用于查询表A、表B和表C的数据,包括“A join B join C”的表达,则根据该SQL语句的查询特征可以确定需要执行操作:A join B join C。在此种情况下,是执行操作A join B后再执行操作join C,还是执行操作B join C后再执行操作join A或者执行操作A join C后再执行操作join B,可以将提高SQL语句的执行效率为目标,根据预设的语句优化策略确定,即可以通过表达式合并策略对SQL语句进行优化,得到优化查询语句。之后,则可以根据优化查询语句在多个计算引擎中确定目标计算引擎。
由此,可以对SQL语句进行统一优化,从而减少用户在数据查询过程中的使用成本,进而提高数据查询效率。
在确定目标计算引擎后,可以将获取到的SQL语句转换为目标计算引擎能够执行的目标数据查询语句。比如,可以针对原SQL语句对应的UDF(User Defined Function,用户自定义函数)和UDAF(User Defined Aggregation Funcation,用户定义聚合函数)进行转换,便于目标计算引擎根据转换后的目标数据查询语句执行对应的数据查询操作。
在可能的方式中,还可以在多个计算引擎中先选择一引擎作为标准引擎,并基于该标准引擎能够处理的结构化查询语言语句的数据格式,先将结构化查询语言语句转换为中间 查询语句。相应地,将结构化查询语言语句转换为目标计算引擎能够执行的目标数据查询语句可以是:确定目标计算引擎是否为标准引擎,若目标计算引擎不是标准引擎,则将中间查询语句转换为目标计算引擎能够执行的目标数据查询语句。
在其他可能的方式中,若目标计算引擎是标准引擎,则可以通过目标计算引擎直接执行该中间查询语句。
示例地,标准引擎可以是SQL语句规则具有通用性、且转换为其他引擎能够执行的SQL语句的转换代价较小的计算引擎。由此,可以提高SQL语句的转换效率,进而提高数据查询效率。比如,考虑到Calcite引擎的通用性,且Calcite引擎与Spark引擎、Presto引擎之间的SQL语句转换代价较小,可以将Calcite引擎作为标准引擎。当然,在其他可能的方式中,也可以选择其他任意计算引擎作为标准引擎,比如选择Spark引擎或Presto引擎作为标准引擎等,本公开实施例对此不作限定。
由此,可以在自动适配各种计算引擎的同时,进一步提高数据查询效率。
在可能的方式中,若标准引擎为Calcite引擎,目标计算引擎为Spark引擎,则基于标准引擎能够处理的结构化查询语言语句的数据格式,将结构化查询语言语句转换为中间查询语句可以是:基于标准引擎能够处理的结构化查询语言语句的格式,将结构化查询语言语句转换为RelNode语句。相应地,将中间查询语句转换为目标计算引擎能够执行的目标数据查询语句可以是:将RelNode语句转换为目标计算引擎能够执行的DataFrame语句。
也即是说,在本公开实施例中,为了自动适配计算引擎,可以先将输入的SQL语句转换为RelNode语句,然后若确定目标计算引擎为Spark引擎,则可以进一步将RelNode语句转换为DataFrame语句,然后通过Spark引擎执行该DataFrame语句,实现数据查询。若确定目标计算引擎为Calcite引擎,则可以直接通过Calcite引擎执行RelNode语句。
应当理解的是,相较于将RelNode语句转换为Spark引擎能够处理的LogicalPlan语句或PhysicalPlan语句,DataFrame语句的API相对稳定,从而可以保证通过API调用对应语句的稳定性,进而保证数据查询的正常执行。另外,相较于将RelNode语句重新转换为对应的SQL语句,需要维护两套SQL解析,但是由于Spark引擎的SQL解析器与Calcite引擎的SQL解析器差异较大,在复杂场景下,解析的时间开销和资源开销都较大,从而会影响数据查询效率。因此,本公开实施例中将RelNode语句转换为DataFrame语句,可以在适配多引擎的场景下,进一步提高数据查询效率。
在可能的方式中,若标准引擎为Calcite引擎,目标计算引擎为Presto引擎,则基于标准引擎能够处理的结构化查询语言语句的格式,将结构化查询语言语句转换为中间查询 语句可以是:基于标准引擎能够处理的结构化查询语言语句的数据格式,将结构化查询语言语句转换为RelNode语句。相应地,将中间查询语句转换为目标计算引擎能够执行的目标数据查询语句可以是:将RelNode语句转换为目标计算引擎能够执行的结构化查询语言语句。
应当理解的是,Calcite引擎没有原生的对Presto引擎的支持,并且Presto引擎也没有稳定的API接口。一种可能的实施方式是先将SQL语句转换为Calcite引擎能够处理的RelNode语句,然后再将RelNode语句转换为Presto引擎能够处理的Node结构。但是,由于Presto引擎缺乏高效API,因此该方式的对接成本非常高,难以快速应用。
发明人研究发现,相比Spark引擎能够执行的数据查询语句与统一结构化查询语言标准之间的差异,Presto引擎和Calcite引擎能够执行的数据查询语句与统一结构化查询语言标准之间的差异更小,因此本公开实施例为了在多引擎场景下,适配Calcite引擎和Presto引擎,可以先将SQL语句转换为Calcite引擎能够处理的RelNode语句,然后再将RelNode语句转换为Presto引擎能够执行的SQL语句。由此,虽然需要维护两套SQL解析,但是由于Presto引擎和Calcite引擎的SQL解析比较相似,因此可以减少解析的时间开销和资源开销,保证数据查询效率。
通过上述方式,可以基于统一SQL标准实现多引擎的自动适配,减少用户在数据查询过程中的使用成本,从而提高数据查询效率。
在实际应用中,若待查询的数据跨多个数据源,则需要先将其他数据源的数据先同步到目标数据源,然后再对数据同步后的目标数据源进行数据查询。其中,目标数据源为多个数据源中的任一数据源,其他数据源为多个数据源中除目标数据源外的剩余数据源。按照此种方式,进行跨数据源联合查询时,均需要执行数据同步任务,如果需要同步的数据众多,则会很大程度上影响数据查询效率。
本公开实施例中,为了实现多数据源的自动适配,支持跨数据源联合查询,省去额外的数据同步任务,可以在结构化查询语言语句中待查询数据的数据源包括至少两个不同的数据源的情况下,根据结构化查询语言语句对应的数据源特征,在多个计算引擎中确定与每一数据源对应的目标计算引擎,并基于数据源特征将所述结构化查询语言语句转换为对应的目标计算引擎能够执行的目标数据查询语句,然后在通过目标计算引擎执行对应的目标数据查询语句后,将多个计算引擎中的任一引擎确定为联合处理引擎,并将各目标计算引擎通过执行目标数据查询语句从对应的数据源查询到的目标数据发送给联合处理引擎,最后通过该联合处理引擎将每一目标数据进行联合处理。
示例地,联合处理引擎可以是各目标计算引擎中的任一者,也可以是多个计算引擎中除目标计算引擎外的其他任一引擎,可以根据实际情况进行配置。应当理解的是,将各目标计算引擎中的任一引擎确定为联合处理引擎,相较于将多个计算引擎中除目标计算引擎外的其他任一引擎确定为联合处理引擎的方式,可以减少数据传输,从而可以提高数据查询效率。
例如,参照图2,用户通过BI(Business Intelligence,商业智能)工具触发了数据查询操作,并基于统一结构化查询语言标准生成了该数据查询操作对应的SQL语句。之后,可以通过JDBC(Java Database Connectivity,Java数据库连接)或REST接口将该SQL语句发送到数据库的联邦查询层。然后可以通过该联邦查询层的引擎适配模块根据SQL语句对应的数据源特征,在引擎层预置的多个计算引擎中确定与每一数据源对应的目标计算引擎,并基于数据源特征将SQL语句转换为对应的目标计算引擎能够执行的目标数据查询语句。之后,可以将目标数据查询语句发送到引擎层,并通过引擎层中的目标计算引擎对数据源层的对应数据源进行数据查询。最后,目标计算引擎可以将从数据源查询到的目标数据发送给引擎层中的任一计算引擎(即联合处理引擎),从而通过该计算引擎进行跨数据源联合处理,即可以通过某一计算引擎将各目标计算引擎查询到的目标数据进行关联后返回给用户。由此,可以支持跨数据源联合查询,省去额外的数据同步任务,从而提高数据查询效率。
另外,参照图2,联邦查询层还可以包括优化模块、管理模块、权限模块和元数据模块。其中,优化模块可以根据结构化查询语言语句的查询特征和预设的语句优化策略,对结构化查询语言语句进行统一优化。管理模块可以管理数据查询过程中将目标数据查询语句提交给目标计算引擎的过程,或者还可以执行日志采集、结果保存等过程。权限模块可以在获取到基于统一结构化查询语言标准确定的SQL语句后,先确定该SQL语句的发起用户是否有权限执行该SQL语句对应的数据操作权限或者可以校验SQL语句的正确性。元数据模块用于存储各用户的数据权限信息以及各数据源对应的元数据,从而更好地确定目标计算引擎。其中,联邦查询层中管理模块、权限模块和元数据模块的具体实现方式与相关技术类似,这里不再赘述。
继续参照图2,引擎层预置的多个计算引擎包括Spark、Hive、Presto、Flink、ClickHouse和ElasticSearch。数据源层包括HDFS、RDS、Kafka、ClickHouse和ElasticSearch。应当理解的是,ClickHouse和ElasticSearch为计算存储一体引擎,所以可以包括在引擎层和数据源层。
例如,按照图2所示的架构,在联机分析处理(OLAP)场景下,数据查询过程可以是:通过JDBC或REST接口获取基于统一结构化查询语言标准确定的SQL语句。然后,然后通过联邦查询层的元数据模块校验SQL语句的正确性,并通过权限模块确定用户是否有查询该SQL语句中待查询数据的权限。之后,可以通过联邦查询层的优化模块对该SQL语句进行统一优化,得到优化查询语句。接着,可以通过联邦查询层的引擎适配模块根据SQL语句的查询特征确定目标计算引擎,并将SQL语句转换为目标计算引擎能够执行的目标数据查询语句后发送给引擎层中的目标计算引擎。之后,目标计算引擎可以根据自身物理层的优化能力,确定针对目标数据查询语句的执行策略,并按照该执行策略执行目标数据查询语句从对应数据源中查询数据。若数据源为多个,则查询数据后还可以将查询到的数据返回给在各目标计算,最后由联邦查询层进行联合处理。
通过上述方式,既可以实现多引擎的数据查询,还可以实现多数据源的联合查询,可以极大程度上降低用户在数据查询过程中的使用成本,从而提高数据查询效率。
基于同一发明构思,本公开还提供一种数据查询装置,该装置可以通过软件、硬件或者两者结合的方式成为电子设备的部分或全部。参照图3,该数据查询装置300可以包括:
获取模块301,用于获取基于统一结构化查询语言标准确定的结构化查询语言语句;
第一确定模块302,用于确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;
第二确定模块303,用于根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;
查询模块304,用于通过所述目标计算引擎执行所述目标数据查询语句。
可选地,所述第一确定模块302用于:
确定所述结构化查询语言语句对应的复杂度特征和/或所述结构化查询语言语句中待查询数据的数据源特征;
所述第二确定模块303用于:
根据所述结构化查询语言语句的所述复杂度特征和/或所述数据源特征,在多个计算引擎中确定目标计算引擎。
可选地,所述结构化查询语言语句中待查询数据的数据源包括至少两个不同的数据源,所述第二确定模块303用于:
根据所述结构化查询语言语句对应的数据源特征,在多个计算引擎中确定与每一所述数据源对应的目标计算引擎,并基于所述数据源特征将所述结构化查询语言语句转换为对应的所述目标计算引擎能够执行的目标数据查询语句;
所述装置300还包括:
联合模块,用于在通过所述目标计算引擎执行所述目标数据查询语句后,将所述多个计算引擎中的任一引擎确定为联合处理引擎,将各所述目标计算引擎通过执行所述目标数据查询语句从对应的所述数据源查询到的目标数据发送给所述联合处理引擎,并通过所述联合处理引擎将每一所述目标数据进行联合处理。
可选地,所述第二确定模块303用于:
根据所述结构化查询语言语句的所述查询特征和预设的语句优化策略,对所述结构化查询语言语句进行优化,得到优化查询语句;
根据所述优化查询语句,在多个计算引擎中确定目标计算引擎。
可选地,所述装置300还包括:
中间转换模块,用于在所述多个计算引擎中选择一引擎作为标准引擎,并基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句;
所述第二确定模块303用于:
确定所述目标计算引擎是否为所述标准引擎;
若所述目标计算引擎不是所述标准引擎,则将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句。
可选地,所述标准引擎为Calcite引擎,所述目标计算引擎为Spark引擎,所述中间转换模块用于基于所述标准引擎能够处理的结构化查询语言语句的数据格式,将所述结构化查询语言语句转换为RelNode语句;
所述第二确定模块303用于将所述RelNode语句转换为所述目标计算引擎能够执行的DataFrame语句。
可选地,所述标准引擎为Calcite引擎,所述目标计算引擎为Presto引擎,所述中间转换模块用于基于所述标准引擎能够处理的结构化查询语言语句的数据格式,将所述结构化查询语言语句转换为RelNode语句;
所述第二确定模块303用于将所述RelNode语句转换为所述目标计算引擎能够执行的结构化查询语言语句。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
基于同一构思,本公开还提供一种非临时性计算机可读存储介质,其上存储有计算机程序,该程序被处理装置执行时实现上述任一数据查询方法的步骤。
基于同一构思,本公开还提供一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现上述任一数据查询方法的步骤。
下面参考图4,其示出了适于用来实现本公开实施例的电子设备400的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图4示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图4所示,电子设备400可以包括处理装置(例如中央处理器、图形处理器等)401,其可以根据存储在只读存储器(ROM)402中的程序或者从存储装置408加载到随机访问存储器(RAM)403中的程序而执行各种适当的动作和处理。在RAM 403中,还存储有电子设备400操作所需的各种程序和数据。处理装置401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(I/O)接口405也连接至总线404。
通常,以下装置可以连接至I/O接口405:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置406;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置407;包括例如磁带、硬盘等的存储装置408;以及通信装置409。通信装置409可以允许电子设备400与其他设备进行无线或有线通信以交换数据。虽然图4示出了具有各种装置的电子设备400,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置409从网络上被下载和安装,或者从存储装 置408被安装,或者从ROM 402被安装。在该计算机程序被处理装置401执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取基于统一结构化查询语言标准确定的结构化查询语言语句;确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;通过所述目标计算引擎执行所述目标数据查询语句。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读 存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,示例1提供了一种数据查询方法,所述方法包括:
获取基于统一结构化查询语言标准确定的结构化查询语言语句;
确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;
根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;
通过所述目标计算引擎执行所述目标数据查询语句。
根据本公开的一个或多个实施例,示例2提供了示例1的方法,所述确定所述结构化查询语言语句对应的查询特征,包括:
确定所述结构化查询语言语句对应的复杂度特征和/或所述结构化查询语言语句中待查询数据的数据源特征;
所述根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,包括:
根据所述结构化查询语言语句的所述复杂度特征和/或所述数据源特征,在多个计算引擎中确定目标计算引擎。
根据本公开的一个或多个实施例,示例3提供了示例1的方法,所述结构化查询语言语句中待查询数据的数据源包括至少两个不同的数据源,所述根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:
根据所述结构化查询语言语句对应的数据源特征,在多个计算引擎中确定与每一所述数据源对应的目标计算引擎,并基于所述数据源特征将所述结构化查询语言语句转换为对应的所述目标计算引擎能够执行的目标数据查询语句;
所述通过所述目标计算引擎执行所述目标数据查询语句后,还包括:
将所述多个计算引擎中的任一引擎确定为联合处理引擎;
将各所述目标计算引擎通过执行所述目标数据查询语句从对应的所述数据源查询到的目标数据发送给所述联合处理引擎,并通过所述联合处理引擎将每一所述目标数据进行联合处理。
根据本公开的一个或多个实施例,示例4提供了示例1-3任一所述的方法,所述根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,包括:
根据所述结构化查询语言语句的所述查询特征和预设的语句优化策略,对所述结构化查询语言语句进行优化,得到优化查询语句;
根据所述优化查询语句,在多个计算引擎中确定目标计算引擎。
根据本公开的一个或多个实施例,示例5提供了示例1-3任一所述的方法,所述方法还包括:
在所述多个计算引擎中选择一引擎作为标准引擎,并基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句;
所述将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:
确定所述目标计算引擎是否为所述标准引擎;
若所述目标计算引擎不是所述标准引擎,则将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句。
根据本公开的一个或多个实施例,示例6提供了示例5所述的方法,所述标准引擎为Calcite引擎,所述目标计算引擎为Spark引擎,所述基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句,包括:
基于所述标准引擎能够处理的结构化查询语言语句的数据格式,将所述结构化查询语言语句转换为RelNode语句;
所述将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:
将所述RelNode语句转换为所述目标计算引擎能够执行的DataFrame语句。
根据本公开的一个或多个实施例,示例7提供了示例5所述的方法,所述标准引擎为Calcite引擎,所述目标计算引擎为Presto引擎,所述基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句,包括:
基于所述标准引擎能够处理的结构化查询语言语句的数据格式,将所述结构化查询语言语句转换为RelNode语句;
所述将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:
将所述RelNode语句转换为所述目标计算引擎能够执行的结构化查询语言语句。
根据本公开的一个或多个实施例,示例8提供了一种数据查询装置,所述装置包括:
获取模块,用于获取基于统一结构化查询语言标准确定的结构化查询语言语句;
第一确定模块,用于确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;
第二确定模块,用于根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;
查询模块,用于通过所述目标计算引擎执行所述目标数据查询语句。
根据本公开的一个或多个实施例,示例9提供了一种非临时性计算机可读存储介质,其上存储有计算机程序,该程序被处理装置执行时实现示例1-7中任一项所述方法的步骤。
根据本公开的一个或多个实施例,示例10提供了一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现示例1-7中任一项所述方法的步骤。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
Claims (10)
- 一种数据查询方法,其特征在于,所述方法包括:获取基于统一结构化查询语言标准确定的结构化查询语言语句;确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;通过所述目标计算引擎执行所述目标数据查询语句。
- 根据权利要求1所述的方法,其特征在于,所述确定所述结构化查询语言语句对应的查询特征,包括:确定所述结构化查询语言语句对应的复杂度特征和/或所述结构化查询语言语句中待查询数据的数据源特征;所述根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,包括:根据所述结构化查询语言语句的所述复杂度特征和/或所述数据源特征,在多个计算引擎中确定目标计算引擎。
- 根据权利要求1所述的方法,其特征在于,所述结构化查询语言语句中待查询数据的数据源包括至少两个不同的数据源,所述根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:根据所述结构化查询语言语句对应的数据源特征,在多个计算引擎中确定与每一所述数据源对应的目标计算引擎,并基于所述数据源特征将所述结构化查询语言语句转换为对应的所述目标计算引擎能够执行的目标数据查询语句;所述通过所述目标计算引擎执行所述目标数据查询语句后,还包括:将所述多个计算引擎中的任一引擎确定为联合处理引擎;将各所述目标计算引擎通过执行所述目标数据查询语句从对应的所述数据源查询到的目标数据发送给所述联合处理引擎,并通过所述联合处理引擎将每一所述目标数据进行联合处理。
- 根据权利要求1-3任一所述的方法,其特征在于,所述根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,包括:根据所述结构化查询语言语句的所述查询特征和预设的语句优化策略,对所述结构化查询语言语句进行优化,得到优化查询语句;根据所述优化查询语句,在多个计算引擎中确定目标计算引擎。
- 根据权利要求1-3任一所述的方法,其特征在于,所述方法还包括:在所述多个计算引擎中选择一引擎作为标准引擎,并基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句;所述将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:确定所述目标计算引擎是否为所述标准引擎;若所述目标计算引擎不是所述标准引擎,则将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句。
- 根据权利要求5所述的方法,其特征在于,所述标准引擎为Calcite引擎,所述目标计算引擎为Spark引擎,所述基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句,包括:基于所述标准引擎能够处理的结构化查询语言语句的数据格式,将所述结构化查询语言语句转换为RelNode语句;所述将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:将所述RelNode语句转换为所述目标计算引擎能够执行的DataFrame语句。
- 根据权利要求5所述的方法,其特征在于,所述标准引擎为Calcite引擎,所述目标计算引擎为Presto引擎,所述基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句,包括:基于所述标准引擎能够处理的结构化查询语言语句的数据格式,将所述结构化查询语言语句转换为RelNode语句;所述将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:将所述RelNode语句转换为所述目标计算引擎能够执行的结构化查询语言语句。
- 一种数据查询装置,其特征在于,所述装置包括:获取模块,用于获取基于统一结构化查询语言标准确定的结构化查询语言语句;第一确定模块,用于确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;第二确定模块,用于根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;查询模块,用于通过所述目标计算引擎执行所述目标数据查询语句。
- 一种非临时性计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理装置执行时实现权利要求1-7中任一项所述方法的步骤。
- 一种电子设备,其特征在于,包括:存储装置,其上存储有计算机程序;处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求1-7中任一项所述方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111032755.6 | 2021-09-03 | ||
CN202111032755.6A CN113704291A (zh) | 2021-09-03 | 2021-09-03 | 数据查询方法、装置、存储介质及电子设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023029854A1 true WO2023029854A1 (zh) | 2023-03-09 |
Family
ID=78659456
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/109468 WO2023029854A1 (zh) | 2021-09-03 | 2022-08-01 | 数据查询方法、装置、存储介质及电子设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113704291A (zh) |
WO (1) | WO2023029854A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118312531A (zh) * | 2024-06-04 | 2024-07-09 | 北京数巅科技有限公司 | 查询语言生成方法、系统、电子设备及存储介质 |
CN118445309A (zh) * | 2024-07-08 | 2024-08-06 | 广州思迈特软件有限公司 | 基于Spark引擎的数据处理方法、装置以及设备 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113704291A (zh) * | 2021-09-03 | 2021-11-26 | 北京火山引擎科技有限公司 | 数据查询方法、装置、存储介质及电子设备 |
CN114357276B (zh) * | 2021-12-23 | 2023-08-22 | 北京百度网讯科技有限公司 | 数据查询方法、装置、电子设备以及存储介质 |
CN114357032A (zh) * | 2022-01-06 | 2022-04-15 | 杭州隆埠科技有限公司 | 一种数据质量监控方法、装置、电子设备及存储介质 |
CN114661746A (zh) * | 2022-02-28 | 2022-06-24 | 北京达佳互联信息技术有限公司 | 语句转换方法、装置、电子设备及存储介质 |
CN114860752A (zh) * | 2022-03-30 | 2022-08-05 | 北京快乐茄信息技术有限公司 | 一种多引擎数据查询方法、装置、设备及存储介质 |
CN114817299B (zh) * | 2022-05-17 | 2024-06-25 | 在线途游(北京)科技有限公司 | 一种基于udaf的数据分析方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190026335A1 (en) * | 2017-07-23 | 2019-01-24 | AtScale, Inc. | Query engine selection |
CN110399388A (zh) * | 2019-07-29 | 2019-11-01 | 中国工商银行股份有限公司 | 数据查询方法、系统和设备 |
CN111061766A (zh) * | 2019-11-27 | 2020-04-24 | 上海钧正网络科技有限公司 | 一种业务数据的处理方法、装置、计算机设备及存储介质 |
CN112699141A (zh) * | 2020-12-29 | 2021-04-23 | 医渡云(北京)技术有限公司 | 多源异构数据的数据查询方法、装置、存储介质及设备 |
CN113704291A (zh) * | 2021-09-03 | 2021-11-26 | 北京火山引擎科技有限公司 | 数据查询方法、装置、存储介质及电子设备 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111309751A (zh) * | 2018-11-27 | 2020-06-19 | 北京奇虎科技有限公司 | 大数据处理方法及装置 |
CN111221842A (zh) * | 2018-11-27 | 2020-06-02 | 北京奇虎科技有限公司 | 大数据处理系统及方法 |
CN110633292B (zh) * | 2019-09-19 | 2022-06-21 | 上海依图网络科技有限公司 | 一种异构数据库的查询方法、装置、介质、设备及系统 |
CN112905620B (zh) * | 2019-11-19 | 2024-05-17 | 北京沃东天骏信息技术有限公司 | 数据查询方法及装置、电子设备、存储介质 |
-
2021
- 2021-09-03 CN CN202111032755.6A patent/CN113704291A/zh active Pending
-
2022
- 2022-08-01 WO PCT/CN2022/109468 patent/WO2023029854A1/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190026335A1 (en) * | 2017-07-23 | 2019-01-24 | AtScale, Inc. | Query engine selection |
CN110399388A (zh) * | 2019-07-29 | 2019-11-01 | 中国工商银行股份有限公司 | 数据查询方法、系统和设备 |
CN111061766A (zh) * | 2019-11-27 | 2020-04-24 | 上海钧正网络科技有限公司 | 一种业务数据的处理方法、装置、计算机设备及存储介质 |
CN112699141A (zh) * | 2020-12-29 | 2021-04-23 | 医渡云(北京)技术有限公司 | 多源异构数据的数据查询方法、装置、存储介质及设备 |
CN113704291A (zh) * | 2021-09-03 | 2021-11-26 | 北京火山引擎科技有限公司 | 数据查询方法、装置、存储介质及电子设备 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118312531A (zh) * | 2024-06-04 | 2024-07-09 | 北京数巅科技有限公司 | 查询语言生成方法、系统、电子设备及存储介质 |
CN118445309A (zh) * | 2024-07-08 | 2024-08-06 | 广州思迈特软件有限公司 | 基于Spark引擎的数据处理方法、装置以及设备 |
Also Published As
Publication number | Publication date |
---|---|
CN113704291A (zh) | 2021-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023029854A1 (zh) | 数据查询方法、装置、存储介质及电子设备 | |
CN109086409B (zh) | 微服务数据处理方法、装置、电子设备及计算机可读介质 | |
US10311055B2 (en) | Global query hint specification | |
WO2023273544A1 (zh) | 日志文件的存储方法、装置、设备和存储介质 | |
WO2023056934A1 (zh) | 数据处理方法、装置和电子设备 | |
US10866960B2 (en) | Dynamic execution of ETL jobs without metadata repository | |
CN108363741B (zh) | 大数据统一接口方法、装置、设备及存储介质 | |
JP2017535842A (ja) | データソースからデータターゲットにデータを転送するためのインポート手順の呼出しの単純化 | |
WO2021203918A1 (zh) | 用于处理模型参数的方法和装置 | |
WO2018196729A1 (zh) | 一种查询处理方法、数据源注册方法及查询引擎 | |
CN111221851A (zh) | 一种基于Lucene的海量数据查询、存储的方法和装置 | |
CN114116842A (zh) | 多维医疗数据实时获取方法、装置、电子设备及存储介质 | |
WO2023029850A1 (zh) | 一种数据处理方法、装置、电子设备和介质 | |
US10592506B1 (en) | Query hint specification | |
CN111241137A (zh) | 数据处理方法、装置、电子设备及存储介质 | |
CN112307061A (zh) | 用于查询数据的方法和装置 | |
US11704327B2 (en) | Querying distributed databases | |
WO2024001756A1 (zh) | 数据存储方法、装置、电子设备和存储介质 | |
WO2023231615A1 (zh) | 一种基于数据湖的物化列创建方法以及数据查询方法 | |
WO2023065937A1 (zh) | 数据处理方法、装置、可读介质及电子设备 | |
CN114036107B (zh) | 基于hudi快照的医疗数据查询方法及装置 | |
WO2023001281A1 (zh) | 表格数据处理方法、装置、终端和存储介质 | |
US20140365516A1 (en) | Optimization of join queries for related data | |
WO2022151835A1 (zh) | 一种样例报文处理方法及装置 | |
CN116127143A (zh) | 数据查询方法、装置、电子设备及可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22862995 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 19/06/2024) |