[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN117827882B - Deep learning-based financial database SQL quality scoring method, system, equipment and storable medium - Google Patents

Deep learning-based financial database SQL quality scoring method, system, equipment and storable medium Download PDF

Info

Publication number
CN117827882B
CN117827882B CN202410014519.9A CN202410014519A CN117827882B CN 117827882 B CN117827882 B CN 117827882B CN 202410014519 A CN202410014519 A CN 202410014519A CN 117827882 B CN117827882 B CN 117827882B
Authority
CN
China
Prior art keywords
sql
graph
node
layer
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410014519.9A
Other languages
Chinese (zh)
Other versions
CN117827882A (en
Inventor
陈传凯
刘宁
李超德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xinshu Technology Co ltd
Original Assignee
Beijing Xinshu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xinshu Technology Co ltd filed Critical Beijing Xinshu Technology Co ltd
Priority to CN202410014519.9A priority Critical patent/CN117827882B/en
Publication of CN117827882A publication Critical patent/CN117827882A/en
Application granted granted Critical
Publication of CN117827882B publication Critical patent/CN117827882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Operations Research (AREA)
  • Human Resources & Organizations (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a deep learning-based financial database SQL quality scoring method, a deep learning-based financial database SQL quality scoring system, a deep learning-based financial database SQL quality scoring device and a deep learning-based financial database SQL quality scoring storage medium. The invention can automatically finish SQL quality grading without defining rules in advance, and gives users visual SQL quality evaluation results, thereby having remarkable advantages in adaptability, automation and expandability. The method can better adapt to the dynamic change of SQL sentences, reduce the need of manual intervention and effectively process large-scale SQL query data. These advantages make the invention more efficient and accurate in terms of quality scores of SQL statements.

Description

Deep learning-based financial database SQL quality scoring method, system, equipment and storable medium
Technical Field
The invention relates to a financial database SQL quality scoring method, system, equipment and storable medium based on deep learning, and belongs to the field of intelligent operation and maintenance.
Background
In the financial field, data is a core element of driving decisions and business operations. With the acceleration of digital transformation, financial institutions accumulate massive amounts of data, including transaction records, customer information, market dynamics, risk assessment, and the like. Currently, the primary storage mode of data is still a relational database. SQL (Structured Query Language) is widely used in data acquisition, processing, and analysis as a standardized database query language in relational databases. However, as the volume of data grows and the complexity increases, the quality problems of SQL statements become increasingly prominent, including:
(1) Query performance problem: due to the large data volume, complex table structure, unreasonable index design and other reasons, some SQL queries may have low execution efficiency, which results in slow system response and influences user experience and business flow.
(2) Data accuracy problem: low quality SQL queries can lead to data extraction errors, omissions, or duplicates, affecting the accuracy of data analysis results, and thus affecting decision making and risk management.
(3) Problem of wasting resources: ineffective or redundant SQL queries may consume excessive computing resources and memory space, increasing operating and maintenance costs and energy consumption.
(4) Potential safety hazard problem: improper SQL query statements may expose sensitive information, causing data leakage and security risks.
In the financial industry, data management and security are subject to stringent regulatory requirements, as are the quality requirements for SQL statements. Meanwhile, in the fierce market competition, financial institutions also need to improve business performance and customer experience by means of data analysis and intelligent decision making. In addition, with the explosive growth of financial data volume, the traditional SQL query method has not been capable of meeting the data processing requirements of high efficiency, accuracy and safety, so that the quality of SQL sentences has to be paid attention to the financial institutions.
At present, in the financial field, SQL quality scoring is paid attention to, and query performance is optimized and data accuracy and safety are improved by designing different scoring algorithms. The existing SQL quality scoring algorithm mainly comprises the following methods:
(1) Grammar and semantic checking: it is mainly checked whether the grammar of SQL statement is correct and the semantics are reasonable. For example, it is checked whether there is a syntax error, whether tables and columns exist, whether the condition of JOIN operation is satisfied, and the like.
(2) Performance evaluation: the execution performance of SQL sentences, such as query response time, resource consumption and the like, is mainly focused. Such methods may predict the performance of SQL statements based on historical data and database statistics.
(3) Security and compliance check: such algorithms mainly detect whether SQL statements present potential security risks, such as SQL injection, rights abuse, sensitive data leakage, etc. At the same time, they may also check whether the SQL statement meets certain compliance requirements or best practices.
Traditional methods are usually based on fixed rules or static features for analysis, and are difficult to adapt to the dynamic changes of SQL sentence structure and semantics. Meanwhile, complex rules and indexes are required to be manually defined and maintained, and automation and expansion to a large-scale SQL query set are difficult.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a dynamic SQL graph of SQL1 of the embodiments.
FIG. 3 is a dynamic SQL graph of SQL2 of the embodiments.
Disclosure of Invention
Based on the analysis, the invention provides a financial database SQL quality scoring method based on deep learning, which introduces a dynamic graph convolution layer, the weight is dynamically updated according to the structure and semantic change of the SQL sentence, so that the adaptability and generalization capability of the model are improved. The scoring method comprises the following specific steps:
(1) Collecting historical SQL query sentences, including normal operation and known attack or abnormal behavior, and converting the SQL sentences into abstract syntax trees;
(2) Carrying out graphical representation on AST of each SQL sentence, constructing a dynamic SQL graph, and dynamically updating the weight of the edge according to the characteristics of the current SQL sentence;
(3) Extracting structural features and attribute features of the SQL map;
(4) For historical data, assigning a risk tag to the historical data according to whether it relates to a security event or abnormal behavior;
(5) Constructing a graph neural network model comprising a dynamic graph convolution layer, a self-attention mechanism and multi-task learning, and using the graph neural network model to learn the embedded representation of the SQL graph;
(6) The graph neural network model is trained using a supervised learning approach, with features of the SQL graph as inputs and predicted values of risk scores and other related tasks as outputs.
(7) For a new SQL query statement, firstly converting the SQL query statement into AST and constructing an SQL graph, then extracting the characteristics of the SQL query statement and inputting the SQL query statement into a trained graph neural network model to obtain an embedded representation of the graph.
Further, in step (6), in each dynamic graph convolution layer, the weight matrix is dynamically updated according to the current hidden states of the nodes and edgesWhere W (l) represents the weight parameters in the convolution layer of the first layer graph, f is a learning function for dynamically computing W (l) based on the current hidden states of the nodes and edges,Representing the hidden state of node v at the first layer,Representing the hidden state of node u at the first level, node u being a node adjacent to node v, e uv representing the feature vector of the edge between nodes u and v.
Further, in step (6), a self-attention mechanism is introduced into the graph neural network model, and self-attention coefficients of the node v are calculatedWherein W, W 1 and W 2 are learnable weight vectors and weight matrices,AndFor hidden states of node v at different layers, the softmax function is used to normalize the attention coefficients and the tanh function is used for nonlinear mapping. And adding a plurality of output nodes at the last layer of the multi-layer sensor, and using a shared hidden layer to realize multi-task learning.
Further, in step (7), risk_score=sigmoid (Σ vg(v)×αv×Wr×hv), where g (v) is a learnable weight adjustment function associated with node v, g (v) =w g×hv+bg, parameters W g and b g are learnable weights and biases, α v is the self-attention coefficient of node v, h v is the hidden state of node v, W r is a learnable weight matrix, and the sigmoid function compresses the risk score to within the range of [0,1 ].
The invention also provides a financial database SQL quality scoring system based on deep learning, which comprises the following modules:
(1) And the data collection and preprocessing module is used for: the module collects historical SQL query sentences, including normal operation and known attack or abnormal behavior, and converts the SQL sentences into abstract syntax trees;
(2) The dynamic SQL graph construction module: the module graphically represents AST of each SQL sentence, and dynamically updates the weight of the edge according to the characteristics of the current SQL sentence when constructing a graph;
(3) And the feature extraction module is used for: extracting structural features and attribute features of the SQL map;
(4) Risk tag allocation module: for historical data, assigning a risk tag to the historical data according to whether it relates to a security event or abnormal behavior;
(5) The graph neural network model building module: constructing a graph neural network model comprising a dynamic graph convolution layer, a self-attention mechanism and multi-task learning, and using the graph neural network model to learn the embedded representation of the SQL graph;
(6) Model training module: the graph neural network model is trained using a supervised learning approach, with features of the SQL graph as inputs and predicted values of risk scores and other related tasks as outputs.
(7) Risk score calculation module: for a new SQL query statement, firstly converting the SQL query statement into AST and constructing an SQL graph, then extracting the characteristics of the SQL query statement and inputting the SQL query statement into a trained graph neural network model to obtain an embedded representation of the graph.
Further, in the model training module, in each dynamic graph convolution layer, the weight matrix is dynamically updated according to the current hidden states of the nodes and the edgesWhere W (l) represents the weight parameters in the convolution layer of the first layer graph, f is a learning function for dynamically computing W (l) based on the current hidden states of the nodes and edges,Representing the hidden state of node v at the first layer,Representing the hidden state of node u at the first level, node u being a node adjacent to node v, e uv representing the feature vector of the edge between nodes u and v.
Further, in the model training module, a self-attention mechanism is introduced into the graph neural network model, and the self-attention coefficient of the node v is calculatedWherein W, W 1 and W 2 are learnable weight vectors and weight matrices,AndFor hidden states of node v at different layers, the softmax function is used to normalize the attention coefficients and the tanh function is used for nonlinear mapping. And adding a plurality of output nodes at the last layer of the multi-layer sensor, and using a shared hidden layer to realize multi-task learning.
Further, in the risk score calculation module, risk_score=sigmoid (Σ vg(v)×αv×Wr×hv), where g (v) is a learnable weight adjustment function related to node v, g (v) =w g×hv+bg, parameters W g and b g are learnable weights and biases, α v is a self-attention coefficient of node v, h v is a hidden state of node v, W r is a learnable weight matrix, and the sigmoid function compresses the risk score to be within the range of [0,1 ].
The present invention also provides an apparatus comprising: the device comprises a data acquisition device, a processor and a memory; the data acquisition device is used for acquiring data; the memory is used for storing one or more program instructions; the processor is configured to execute one or more program instructions to perform any of the methods described above.
The present invention further provides a computer readable storage medium having one or more program instructions embodied therein for performing any of the methods described above.
With the invention, there are significant advantages in the following 3 aspects:
The adaptability: traditional methods typically analyze based on fixed rules or static features, and are difficult to adapt to dynamic changes in SQL statement structure and semantics. However, the present invention utilizes a dynamic graph convolution layer to accommodate structural and semantic changes in SQL statements. Through a self-attention mechanism, the invention can capture long-distance dependency relationship among nodes, and better understand the overall structure and intention of query.
And (3) automation: traditional methods require complex rules and metrics to be manually defined and maintained, which is time consuming and error prone. In contrast, the invention is trained and optimized by a machine learning method, thereby greatly reducing the need of manual intervention. This means that the present invention can automatically learn and identify security risks in SQL statements without the need for manually defining and maintaining rules.
Scalability: traditional approaches may encounter performance bottlenecks when processing large-scale SQL query sets. However, the invention utilizes the graph neural network to learn, and can effectively process large-scale SQL query data. Through multi-task learning and sharing of the hidden layer, the method can be effectively expanded to more related tasks, and the generalization capability of the model is improved.
Detailed Description
The invention designs an SQL quality scoring method based on deep learning, which can dynamically update weights according to the structure and semantic change of SQL sentences by introducing a dynamic graph convolution layer, thereby improving the adaptability and generalization capability of a model.
The SQL quality scoring method based on deep learning mainly comprises the steps shown in fig. 1, specifically:
(1) Data collection and preprocessing
Historical SQL query statements are collected, including normal operation and known attack or abnormal behavior, and the SQL statements are converted into abstract syntax tree (Abstract Syntax Tree, AST) representations. An abstract syntax tree is a data structure representing the structure and syntax elements of source code or programming language statements.
(2) Construction of dynamic SQL graphs
The AST of each SQL statement is graphically represented, wherein nodes are defined as SQL keywords, table names, column names, functions and other elements in the graph, and edges are defined as relationships among the elements (such as parent nodes-child nodes, tables-columns and the like). Given that the structure and semantics of SQL statements may change over time, a dynamic graph convolution layer may be introduced to accommodate these changes. When constructing the graph, the weights of the edges are dynamically updated according to the characteristics of the current SQL statement.
(3) Feature extraction
The method for extracting the structural features and the attribute features of the SQL map mainly comprises the following steps:
1) Node type and number of graph
2) Edge type and number of graphs
3) Degree distribution of nodes
4) Hierarchical structure information
5) Importance of tables and columns (based on access frequency, sensitivity, and context information)
6) Combination and order of SQL keywords
7) Function and operator used
(4) Risk label assignment
For historical data, risk tags are assigned to it according to whether it relates to a security event or abnormal behavior. For example, SQL statements that involve data leakage, injection attacks, or abnormal data modification are marked as high risk.
(5) Construction of a graph neural network model
A graph neural network model is constructed that includes a dynamic graph convolutional layer, a self-attention mechanism, and a multitasking study for learning an embedded representation of the SQL graph. The graph neural network model may contain multiple dynamic graph convolution layers, self-attention layers, and pooling layers for capturing local and global graph structure information.
(6) Model training
The graph neural network model is trained by using a supervised learning method, the characteristics of the SQL graph are input, and the output is a risk score and the predicted value of other related tasks, such as the execution time of SQL sentences, the data access amount and the like.
In each dynamic graph convolution layer, dynamically updating a weight matrix according to the current hidden states of the nodes and the edges, wherein the calculation mode is as follows:
Where W (l) represents the weight parameters in the picture volume layer of the first layer, f is a learning function for dynamically calculating W (l) based on the current hidden states of the nodes and edges, The hidden state of the node v at the first layer is represented, and the characteristic representation of the node v after the previous layer of graph rolling lamination processing is represented.The hidden state of the node u at the first layer is represented, and the characteristic representation of the node u after the previous layer of graph rolling lamination processing is represented. Node u is one of the nodes adjacent to node v, and e uv represents a feature vector of an edge between nodes u and v, and contains information describing characteristics of the edge uv, such as direction, type, weight, etc. of the edge.
To better capture long-range dependencies between nodes, self-attention mechanisms are introduced in the graph neural network model. The self-attention coefficient of node v can be calculated using the following formula
Where W, W 1, and W 2 are learnable weight vectors and weight matrices,AndIs the hidden state of node v at different layers, the softmax function is used to normalize the attention coefficient, and the tanh function is used for nonlinear mapping.
And adding a plurality of output nodes at the last layer of the multi-layer sensor, and using a shared hidden layer to realize multi-task learning.
(7) Risk score calculation:
For a new SQL query statement, firstly converting the SQL query statement into AST and constructing an SQL graph, then extracting the characteristics of the SQL query statement and inputting the SQL query statement into a trained graph neural network model to obtain an embedded representation of the graph.
The risk score is calculated from the embedded representation of the graph using a graph annotation mechanism, emphasizing the nodes or edges that have a greater impact on risk.
risk_score=sigmoid(∑vg(v)×αv×Wr×hv)
Where g (v) is a learnable weight adjustment function associated with node v, g (v) =w g×hv+bg, parameters W g and b g are learnable weights and biases, α v is the self-attention coefficient of node v, h v is the hidden state of node v, W r is a learnable weight matrix, and the sigmoid function compresses the risk score to within the range of [0,1 ].
Since the actual values and diagrams are affected by the specific data set, model parameters and training process, the following describes the process of the method according to the invention by taking a simple specific example as an example:
(1) Data collection and pretreatment:
the following two SQL query statements are collected as input data:
SQL1:SELECT column1,column2 FROM table1 WHERE condition1 AND condition2
SQL2:SELECT×FROM users WHERE username='admin'OR 1=1'
performing lexical analysis and grammar analysis on each SQL sentence, and converting the SQL sentence into Abstract Syntax Trees (ASTs) respectively:
AST1 (corresponding to SQL 1): [ SELECT, [ column1, column2], FROM, table1, WHERE, [ condition1, AND, condition2]
AST2 (corresponding to SQL 2): [ SELECT, ×, FROM, users, WHERE, [ username, =, 'admin', OR, 1=1 ]
(2) Constructing a dynamic SQL graph:
each AST is converted into a graph representation in which nodes represent syntax elements in the SQL statement and edges represent relationships between the elements. The weights of the edges are dynamically updated according to the characteristics of the current SQL statement.
Simplified diagrams (only part of the nodes and edges are shown) are shown in fig. 2-3, respectively, fig. 2 corresponding to SQL1 and fig. 3 corresponding to SQL2.
(3) Feature extraction:
the structural features and attribute features of each SQL graph, such as node type, edge type, node degree distribution, hierarchical structure information, etc., are extracted.
Processing results (simplified representation):
Feature vector 1 (corresponding to SQL 1): [0.1,0.2,0.3.] (assuming these are extracted eigenvalues)
Feature vector 2 (corresponding to SQL 2): [0.4,0.5,0.6.] (assuming these are extracted eigenvalues)
(4) Risk tag assignment:
normal queries are marked as low risk, high risk queries are marked as high risk.
Treatment results:
Tag 1 (corresponding to SQL 1): low risk
Tag 2 (corresponding to SQL 2): high risk
(5) Constructing a graph neural network model:
a neural network model is constructed that includes a dynamic graph convolutional layer, a self-attention mechanism, and a multitasking study.
(6) Model training:
The graph neural network model is trained using a supervised learning approach, with inputs being features of the SQL graph and outputs being predictive of risk scores and other relevant tasks.
In each dynamic graph convolutional layer, the weight matrix W (l) is dynamically updated according to the current hidden states of the nodes and edges.
Introducing self-attention mechanism in graph neural network model, calculating attention coefficient between nodes
G (v) of the linear function representation is introduced for adjusting the contribution of the different nodes to the final risk score.
(7) Risk score calculation:
and converting the new SQL query statement into AST and constructing an SQL graph, extracting the characteristics of the SQL query statement and inputting the SQL query statement into a trained graph neural network model to obtain the embedded representation of the graph.
Calculating a risk score by using a formula with adjusted attention weight, and assuming that the risk score obtained by model calculation is as follows:
Risk score 1 (corresponding to SQL 1): 0.1 (lower risk)
Risk score 2 (corresponding to SQL 2): 0.9 (higher risk)
Aiming at specific SQL sentences, analysis can be carried out through fixed rules or static features in the traditional method, but the invention does not need predefined rules at all, automatically finishes the grading of SQL quality, gives users visual SQL quality evaluation results, and has remarkable advantages in the aspects of adaptability, automation and expandability. The method can better adapt to the dynamic change of SQL sentences, reduce the need of manual intervention and effectively process large-scale SQL query data. These advantages make the invention more efficient and accurate in terms of quality scores of SQL statements.
The present invention also provides an apparatus comprising: the device comprises a data acquisition device, a processor and a memory; the data acquisition device is used for acquiring data; the memory is used for storing one or more program instructions; the processor is configured to execute one or more program instructions to perform any of the methods described above.
The present invention further provides a computer readable storage medium having one or more program instructions embodied therein for performing any of the methods described above.
The units, devices or modules etc. set forth in the above embodiments may be implemented in particular by a computer chip or entity or by a product having a certain function. For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, when implementing the present application, the functions of each module may be implemented in the same or multiple pieces of software and/or hardware, or a module implementing the same function may be implemented by multiple sub-modules or a combination of sub-units. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller can be regarded as a hardware component, and means for implementing various functions included therein can also be regarded as a structure within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
Various embodiments in this specification are described in a progressive manner, and identical or similar parts are all provided for each embodiment, each embodiment focusing on differences from other embodiments. The application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the application, and is not meant to limit the scope of the application, but to limit the application to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the application are intended to be included within the scope of the application.

Claims (6)

1. The method introduces a dynamic graph convolution layer and dynamically updates weights according to the structure and semantic change of SQL sentences, and is characterized in that: the method comprises the following specific steps:
(1) Collecting historical SQL query sentences, including normal operation and known attack or abnormal behavior, and converting the SQL sentences into abstract syntax trees;
(2) Carrying out graphical representation on AST of each SQL sentence, constructing a dynamic SQL graph, and dynamically updating the weight of the edge according to the characteristics of the current SQL sentence;
(3) Extracting structural features and attribute features of the SQL map;
(4) For historical data, assigning a risk tag to the historical data according to whether it relates to a security event or abnormal behavior;
(5) Constructing a graph neural network model comprising a dynamic graph convolution layer, a self-attention mechanism and multi-task learning, and using the graph neural network model to learn the embedded representation of the SQL graph;
(6) Training a graph neural network model by using a supervised learning method, wherein the characteristics of the SQL graph are taken as input, and the predicted values of risk scores and other related tasks are taken as output;
(7) For a new SQL query statement, firstly converting the SQL query statement into AST and constructing an SQL graph, then extracting the characteristics of the SQL query statement and inputting the SQL query statement into a trained graph neural network model to obtain an embedded representation of the graph;
In step (6), in each dynamic graph convolution layer, dynamically updating the weight matrix according to the current hidden states of the nodes and edges Where W (l) represents the weight parameters in the convolution layer of the first layer graph, f is a learning function for dynamically computing W (l) based on the current hidden states of the nodes and edges,Representing the hidden state of node v at the first layer,Representing the hidden state of a node u at the first layer, wherein the node u is a node adjacent to the node v, and e uv represents the feature vector of the edge between the nodes u and v;
introducing self-attention mechanism in graph neural network model, calculating self-attention coefficient of node v Wherein W, W 1 and W 2 are learnable weight vectors and weight matrices,AndFor the hidden states of the node v at different layers, a softmax function is used for normalizing the attention coefficient, and a tanh function is used for nonlinear mapping; and adding a plurality of output nodes at the last layer of the multi-layer sensor, and using a shared hidden layer to realize multi-task learning.
2. The deep learning-based financial database SQL quality scoring method of claim 1, wherein: in step (7), risk_score=sigmoid (Σ vg(v)×αv×Wr×hv), where g (v) is a learnable weight adjustment function associated with node v, g (v) =w g×hv+bg, parameters W g and b g are learnable weights and biases, α v is the self-attention coefficient of node v, h v is the hidden state of node v, W r is a learnable weight matrix, and the sigmoid function compresses the risk score to within the range of [0,1 ].
3. A financial database SQL quality scoring system based on deep learning is characterized in that: the system comprises the following modules:
(1) And the data collection and preprocessing module is used for: the module collects historical SQL query sentences, including normal operation and known attack or abnormal behavior, and converts the SQL sentences into abstract syntax trees;
(2) The dynamic SQL graph construction module: the module graphically represents AST of each SQL sentence, and dynamically updates the weight of the edge according to the characteristics of the current SQL sentence when constructing a graph;
(3) And the feature extraction module is used for: extracting structural features and attribute features of the SQL map;
(4) Risk tag allocation module: for historical data, assigning a risk tag to the historical data according to whether it relates to a security event or abnormal behavior;
(5) The graph neural network model building module: constructing a graph neural network model comprising a dynamic graph convolution layer, a self-attention mechanism and multi-task learning, and using the graph neural network model to learn the embedded representation of the SQL graph;
(6) Model training module: training a graph neural network model by using a supervised learning method, wherein the characteristics of the SQL graph are taken as input, and the predicted values of risk scores and other related tasks are taken as output;
(7) Risk score calculation module: for a new SQL query statement, firstly converting the SQL query statement into AST and constructing an SQL graph, then extracting the characteristics of the SQL query statement and inputting the SQL query statement into a trained graph neural network model to obtain an embedded representation of the graph;
In the model training module, in each dynamic graph convolution layer, the weight matrix is dynamically updated according to the current hidden states of the nodes and the edges Where W (l) represents the weight parameters in the convolution layer of the first layer graph, f is a learning function for dynamically computing W (l) based on the current hidden states of the nodes and edges,Representing the hidden state of node v at the first layer,Representing the hidden state of a node u at the first layer, wherein the node u is a node adjacent to the node v, and e uv represents the feature vector of the edge between the nodes u and v;
in the model training module, a self-attention mechanism is introduced into the graph neural network model, and the self-attention coefficient of the node v is calculated Wherein W, W 1 and W 2 are learnable weight vectors and weight matrices,AndFor the hidden states of the node v at different layers, a softmax function is used for normalizing the attention coefficient, and a tanh function is used for nonlinear mapping; and adding a plurality of output nodes at the last layer of the multi-layer sensor, and using a shared hidden layer to realize multi-task learning.
4. A deep learning based financial database SQL quality scoring system as recited in claim 3, wherein: in the risk score calculation module, risk_score=sigmoid (Σ vg(v)×αv×Wr×hv), where g (v) is a learnable weight adjustment function related to node v, g (v) =w g×hv+bg, parameters W g and b g are learnable weights and biases, α v is a self-attention coefficient of node v, h v is a hidden state of node v, W r is a learnable weight matrix, and the sigmoid function compresses the risk score into the range of [0,1 ].
5. A deep learning based financial database SQL quality scoring apparatus, the apparatus comprising: the device comprises a data acquisition device, a processor and a memory; the data acquisition device is used for acquiring data; the memory is used for storing one or more program instructions; the processor being configured to execute one or more program instructions for performing the method of any of the preceding claims 1-2.
6. A deep learning based financial database SQL quality scoring computer readable storage medium having one or more program instructions embodied therein for performing the method of any of claims 1-2.
CN202410014519.9A 2024-01-04 2024-01-04 Deep learning-based financial database SQL quality scoring method, system, equipment and storable medium Active CN117827882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410014519.9A CN117827882B (en) 2024-01-04 2024-01-04 Deep learning-based financial database SQL quality scoring method, system, equipment and storable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410014519.9A CN117827882B (en) 2024-01-04 2024-01-04 Deep learning-based financial database SQL quality scoring method, system, equipment and storable medium

Publications (2)

Publication Number Publication Date
CN117827882A CN117827882A (en) 2024-04-05
CN117827882B true CN117827882B (en) 2024-08-20

Family

ID=90509336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410014519.9A Active CN117827882B (en) 2024-01-04 2024-01-04 Deep learning-based financial database SQL quality scoring method, system, equipment and storable medium

Country Status (1)

Country Link
CN (1) CN117827882B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118735367A (en) * 2024-09-04 2024-10-01 宏景科技股份有限公司 Data quality risk assessment method, system, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177123A (en) * 2021-04-29 2021-07-27 思必驰科技股份有限公司 Optimization method and system for text-to-SQL model
CN114911820A (en) * 2022-06-13 2022-08-16 国网智能电网研究院有限公司 SQL statement judging model construction method and SQL statement judging method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10747761B2 (en) * 2017-05-18 2020-08-18 Salesforce.Com, Inc. Neural network based translation of natural language queries to database queries
US20200133952A1 (en) * 2018-10-31 2020-04-30 International Business Machines Corporation Natural language generation system using graph-to-sequence model
US11748613B2 (en) * 2019-05-10 2023-09-05 Baidu Usa Llc Systems and methods for large scale semantic indexing with deep level-wise extreme multi-label learning
WO2023126914A2 (en) * 2021-12-27 2023-07-06 Yeda Research And Development Co. Ltd. METHOD AND SYSTEM FOR SEMANTIC APPEARANCE TRANSFER USING SPLICING ViT FEATURES
CN115757804A (en) * 2022-09-06 2023-03-07 华中科技大学 Knowledge graph extrapolation method and system based on multilayer path perception
CN115470232A (en) * 2022-09-29 2022-12-13 阿里巴巴(中国)有限公司 Model training and data query method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177123A (en) * 2021-04-29 2021-07-27 思必驰科技股份有限公司 Optimization method and system for text-to-SQL model
CN114911820A (en) * 2022-06-13 2022-08-16 国网智能电网研究院有限公司 SQL statement judging model construction method and SQL statement judging method

Also Published As

Publication number Publication date
CN117827882A (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN110347719B (en) Enterprise foreign trade risk early warning method and system based on big data
US8577823B1 (en) Taxonomy system for enterprise data management and analysis
CN112612902A (en) Knowledge graph construction method and device for power grid main device
CN112527774A (en) Data center building method and system and storage medium
CN112199512B (en) Scientific and technological service-oriented case map construction method, device, equipment and storage medium
CN109241199B (en) Financial knowledge graph discovery method
CN107103363A (en) A kind of construction method of the software fault expert system based on LDA
CN117827882B (en) Deep learning-based financial database SQL quality scoring method, system, equipment and storable medium
CN111199469A (en) User payment model generation method and device and electronic equipment
Wang et al. Recovering relationships between documentation and source code based on the characteristics of software engineering
CN107527289B (en) Investment portfolio industry configuration method, device, server and storage medium
Luo et al. Convolutional neural network algorithm–based novel automatic text classification framework for construction accident reports
Hao et al. A novel method using LSTM-RNN to generate smart contracts code templates for improved usability
CN118227655B (en) Database query statement generation method, device, equipment and storage medium
CN115033705A (en) Power grid regulation and control risk early warning information knowledge graph design method and system
MUMINOV et al. Fvs-Technology: Intellectual Search Tools
CN116860311A (en) Script analysis method, script analysis device, computer equipment and storage medium
CN116205296A (en) ABAC strategy engineering method integrating top-down and bottom-up
CN111242520B (en) Feature synthesis model generation method and device and electronic equipment
CN116226371A (en) Digital economic patent classification method
CN115587190A (en) Construction method and device of knowledge graph in power field and electronic equipment
Fisun et al. Knowledge management applications based on user activities feedback
CN114722159A (en) Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources
Huang et al. Digital Transformation Strategy for Financial Management of Entity Enterprises in the Information Age
CN117764536B (en) Innovative entrepreneur project auxiliary management system based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant