CN118568305A

CN118568305A - Graph data processing method, device, equipment, medium and program product

Info

Publication number: CN118568305A
Application number: CN202410788937.3A
Authority: CN
Inventors: 林天涯
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2024-06-19
Filing date: 2024-06-19
Publication date: 2024-08-30

Abstract

The disclosure provides a graph data processing method, and relates to the field of artificial intelligence. The method comprises the following steps: inputting original financial data to be processed into a large language model, wherein the large language model is obtained by performing graph data processing training in advance through a financial data sample; obtaining financial graph data extracted from the original financial data by the large language model, wherein the financial graph data comprises original graph data and refined graph data, the original graph data comprises a data structure comprising original nodes and original relations obtained from the original financial data, and the refined graph data comprises a data structure comprising abstract nodes and abstract relations extracted based on the original nodes and the original relations; and carrying out graph data processing according to the original graph data and the refined graph data to obtain a graph data processing result. The present disclosure also provides a graph data processing apparatus, a device, a storage medium, and a program product.

Description

Graph data processing method, device, equipment, medium and program product

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly, to a graph data processing method, apparatus, device, medium, and program product.

Background

Graph data is a data structure used to represent relationships (edges) between entities (nodes). Specifically, the graph data is composed of nodes and edges, wherein the nodes represent entities and the edges represent relationships between the entities. Both nodes and edges may contain attributes that describe their features or attributes. The structure of the graph data may be undirected or directed, and edges may be unweighted or weighted. Such a structure of graph data is well suited for representing complex relationships between entities such as financial transactions, social networks, recommendation systems, knowledge graphs, and biological networks.

With the advent of the big data age, graph data has been widely used in various fields. The traditional graph data extraction method generally adopts a simple rule or pattern matching technology, cannot effectively process large-scale and complex graph data, and is difficult to extract implicit structural relations.

Disclosure of Invention

In view of the foregoing, the present disclosure provides graph data processing methods, apparatuses, devices, media, and program products.

According to a first aspect of the present disclosure, there is provided a graph data processing method, characterized in that the method includes: inputting original financial data to be processed into a large language model, wherein the large language model is obtained by performing graph data processing training in advance through a financial data sample; obtaining financial graph data extracted from the original financial data by the large language model, wherein the financial graph data comprises original graph data and refined graph data, the original graph data comprises a data structure comprising original nodes and original relations obtained from the original financial data, and the refined graph data comprises a data structure comprising abstract nodes and abstract relations extracted based on the original nodes and the original relations; and carrying out graph data processing according to the original graph data and the refined graph data to obtain a graph data processing result.

According to an embodiment of the present disclosure, obtaining financial graph data extracted from the raw financial data by the large language model includes obtaining the refined graph data, specifically including: inputting the characteristics of the original nodes and the characteristics of the original relations into the large language model; the abstract nodes are refined based on the features of the original nodes through the large language model, and the abstract relationships are refined based on the features of the original relationships.

According to an embodiment of the present disclosure, in addition to inputting the features of the original nodes and the features of the original relationships into the large language model, obtaining the refined graph data further includes: and inputting preset prompt words into the large language model, wherein the prompt words are used for indicating the large language model to follow node extraction requirements and relation extraction requirements matched with a preset financial scene.

According to an embodiment of the present disclosure, obtaining financial graph data extracted from the raw financial data by the large language model includes obtaining the raw graph data, specifically including: obtaining original graph features extracted from the original financial data from the large language model, wherein the original graph features comprise feature vectors of at least part of the original financial data; and processing the original graph characteristics based on a preset financial rule to obtain the original graph data, wherein the preset financial rule indicates at least one constraint condition for extracting the characteristics of the original nodes and the characteristics of the original relations.

In accordance with an embodiment of the present disclosure, before inputting the raw financial data to be processed into the large language model, the method further comprises: obtaining at least one financial document, wherein the financial document is generated in response to a user transacting financial business; parsing the at least one financial document to obtain the raw financial data represented in a predetermined format.

According to an embodiment of the present disclosure, each financial transaction is preconfigured with parsing rules, parsing the at least one financial document, obtaining the raw financial data represented in a predetermined format includes: determining a target analysis rule according to the financial business corresponding to each financial document, wherein the analysis rule is used for acquiring target business data in the financial document and comprises preset format contents of the target business data; parsing based on a target parsing rule of each financial document to obtain the original financial data expressed in a predetermined format.

According to an embodiment of the present disclosure, the predetermined format characterizes transaction entities and transaction relationships between transaction entities in the raw financial data.

Another aspect of an embodiment of the present disclosure provides a graph data processing apparatus, wherein the apparatus includes: the data input module is used for inputting the original financial data to be processed into a large language model, wherein the large language model is obtained by carrying out graph data processing training through financial data samples in advance; a graph data extraction module for obtaining financial graph data extracted from the original financial data by the large language model, wherein the financial graph data comprises original graph data and refined graph data, the original graph data comprises a data structure comprising original nodes and original relations obtained from the original financial data, and the refined graph data comprises a data structure comprising abstract nodes and abstract relations extracted based on the original nodes and the original relations; and the graph data processing module is used for performing graph data processing according to the original graph data and the refined graph data to obtain graph data processing results.

Another aspect of an embodiment of the present disclosure provides an electronic device, including: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.

Another aspect of the disclosed embodiments provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform the method as described above.

Another aspect of the disclosed embodiments provides a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

One or more of the above embodiments have the following advantages: the method has the advantages that the original financial data is processed by using a large language model, the financial graph data is extracted, on the basis of extracting the original graph data, the complex mode and the hidden structure which are difficult to find by the traditional method can be identified and extracted, the refined graph data is obtained, new nodes and relations can be found, and more-dimensional information is provided. And automatically extracting nodes and relations which are contained and additionally extracted in the financial data through a large language model, so that efficient and accurate processing of the graph data is realized.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of graph data processing according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a graph data processing method according to an embodiment of the disclosure;

FIG. 3 schematically illustrates a flow chart of a graph data processing method according to another embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of obtaining raw financial data in accordance with an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow diagram for obtaining refined graph data in accordance with an embodiment of the disclosure;

FIG. 6 schematically illustrates a flow chart of obtaining raw graph data according to an embodiment of the disclosure;

FIG. 7 schematically illustrates a flow chart of a method of graph data processing in accordance with another embodiment of the present disclosure;

FIG. 8 schematically illustrates a flow chart of acquiring financial graph data in accordance with an embodiment of the present disclosure;

FIG. 9 schematically illustrates a block diagram of a diagram data processing apparatus according to an embodiment of the present disclosure; and

Fig. 10 schematically illustrates a block diagram of an electronic device adapted to implement the graph data processing method according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

In the technical scheme of the invention, the related user information (including but not limited to user personal information, user image information, user equipment information, such as position information and the like) and data (including but not limited to data for analysis, stored data, displayed data and the like) are information and data authorized by a user or fully authorized by all parties, and the related data are collected, stored, used, processed, transmitted, provided, disclosed, applied and the like, all comply with related laws and regulations and standards, necessary security measures are adopted, no prejudice to the public order is provided, and corresponding operation entries are provided for the user to select authorization or rejection.

In the scenario of using personal information to make an automated decision, the method, the device and the system provided by the embodiment of the invention provide corresponding operation inlets for users, so that the users can choose to agree or reject the automated decision result; if the user selects refusal, the expert decision flow is entered. The expression "automated decision" here refers to an activity of automatically analyzing, assessing the behavioral habits, hobbies or economic, health, credit status of an individual, etc. by means of a computer program, and making a decision. The expression "expert decision" here refers to an activity of making a decision by a person who is specializing in a certain field of work, has specialized experience, knowledge and skills and reaches a certain level of expertise.

Fig. 1 schematically illustrates an application scenario diagram of graph data processing according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example in which embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, an application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device. For example, the server 105 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing a basic cloud computing service such as a cloud service, a cloud computing service, a network service, or a middleware service.

It should be noted that, the graph data processing method provided by the embodiments of the present disclosure may be generally performed by at least one of a terminal device or a server. Accordingly, the graph data processing apparatus provided by the embodiments of the present disclosure may be generally provided in at least one of a terminal device or a server.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The graph data processing method according to the embodiment of the present disclosure will be described in detail below with reference to fig. 2 to 8 based on the scenario described in fig. 1.

Fig. 2 schematically illustrates a flowchart of a graph data processing method according to an embodiment of the present disclosure.

As shown in fig. 2, this embodiment includes:

In operation S210, inputting raw financial data to be processed into a large language model, wherein the large language model is obtained by performing graph data processing training in advance through a financial data sample;

Raw financial data includes financial industry data that has not been subjected to drawing data extraction processing, such as transaction records, stock prices, bank running accounts, financial reports, financial news and market analysis reports, etc., and may include time series data (e.g., stock prices, transaction amounts, exchange rates, etc.), text data (e.g., market forecast reports, company news, announcements, etc.), and structured data (e.g., revenue in company financial reports, asset liability sheets, profit, etc.). The large language model (Large Language Model, LLM) is pre-trained using a large amount of data, and can generate natural language text or understand the meaning of the language text. Large language models are characterized by a large scale, containing billions of parameters, which help them learn complex patterns in language data. These models are typically based on deep learning architectures, such as a Transformer structure, which helps them to achieve impressive performance on various natural language processing tasks.

In some embodiments, the graph data processing training is performed through fine tuning of financial data samples (such as historical transaction records, historical stock prices, historical bank running accounts, etc.), and the large language model can be better applied to downstream tasks. Where authorized, the large language model may use a GPT (generative pre-Training language model, GENERATIVE PRE-Training) series model.

In operation S220, financial graph data extracted from the original financial data by the large language model is obtained, wherein the financial graph data includes original graph data including a data structure including original nodes and original relationships obtained from the original financial data and refined graph data including a data structure including abstract nodes and abstract relationships refined based on the original nodes and the original relationships;

The financial graph data includes data for constructing a graph in the financial field extracted from the original financial data, such as a trade relationship network between different users, a relationship network between different companies in a stock market, etc. The original map data refers to map data extracted directly from original financial data. For example, a transaction network that is directly extracted from the transaction data itself. The refined graph data refers to data obtained by processing and refining the original graph data, and includes abstract node and edge information. For example, multiple micropayment transactions are consolidated into one "micropayment node" as an abstract node.

In the raw graph data, raw nodes represent raw entities extracted from raw financial data, and raw relationships represent links between raw entities extracted from raw financial data. For example, bank accounts act as nodes and transfers between them act as relationships. In refining graph data, abstract nodes may represent higher-level entities or concepts, such as more representative entities of a class, group, or particular type, or added virtual entities. Abstracting the nodes may include refining the nodes corresponding to the one or more virtual entities based on attributes of the plurality of original entities. Abstract relationships are refined or abstracted based on original relationships, and may represent higher-level or more abstract relationship types, such as industry links, market trends, etc., including, for example, links between virtual entities refined based on relationships of multiple original entities, or predicted relationships used to complement relationships between original entities.

In operation S230, graph data processing is performed according to the original graph data and the refined graph data, and a graph data processing result is obtained.

The graph data processing includes processing data of the graph structure, such as storing nodes and edges, designing a special graph database to store various graph data, and adopting a graph calculation engine to calculate the graph data so as to construct a visualized knowledge graph based on the nodes and the edges.

The embodiment can be applied to various financial scenarios, such as risk assessment, market prediction, investment portfolio optimization, and the like. An example of an application scenario of bank anti-fraud analysis is given. Banks wish to detect and prevent fraud by analyzing transaction data. In this case, the raw financial data includes all transaction records between bank accounts. The bank inputs historical transaction data into a large language model that has been trained to identify and understand patterns of financial transactions. The transaction data is analyzed through a large language model, financial graph data is extracted, wherein the financial graph data comprises transaction accounts (original nodes) and transfer records (original relations) between the transaction accounts, and account groups (abstract nodes) with frequent transactions and transfer modes (abstract relations) between the account groups (abstract nodes) are summed up. Next, using graph data processing techniques, the raw graph data and the refined graph data are analyzed to construct a knowledge graph to reveal potential fraud patterns. For example, one account may frequently conduct large transactions with a newly created account, which may be evidence of impermissible transaction activity.

According to the embodiment of the disclosure, the original financial data is processed by using the large language model, the financial graph data is extracted, on the basis of extracting the original graph data, the complex mode and the hidden structure which are difficult to find by the traditional method can be identified and extracted, the refined graph data is obtained, new nodes and relations can be found, and more dimensional information is provided. And automatically extracting nodes and relations which are contained and additionally extracted in the financial data through a large language model, so that efficient and accurate processing of the graph data is realized.

Fig. 3 schematically illustrates a flow chart of a graph data processing method according to another embodiment of the present disclosure.

As shown in fig. 3, the embodiment includes operations S210 to S230, and operations S310 to S320 before inputting the raw financial data to be processed into the large language model:

at operation S310, obtaining at least one financial document, wherein the financial document is generated in response to a user transacting financial business;

The financial document refers to a file or record generated during the process of handling financial business by the user, and comprises financial transaction information, account information and the like of the user, such as bank statement, loan contract, investment report and the like. Such as documents for recording time series and structured data, text documents for news, reports, etc., relational databases, or database tables stored in non-relational databases.

The original data in the financial document, even though it contains the dot-edge relationship of the graph data, cannot be directly reflected as the graph data structure in terms of data storage and organization. Therefore, in order to perform analysis processing of the graph data, document analysis is first required.

At operation S320, at least one financial document is parsed to obtain raw financial data represented in a predetermined format.

The predetermined format includes a specific format for unifying and normalizing the raw financial data so that the data itself contains a certain rule that can be properly processed by the large language model. The predetermined format may include a specific data format, such as vectorized data, and may also include a format expressed in a specific rule, such as a format expressed in which transfer records are subjected to data preprocessing and then processed into vectorization to obtain "Zhang Santransfer 1000 Yuan to Lisi", "Lisi four transfer 2500 Yuan to Wang Wu", and the like, and "A transfer XX Yuan to B".

For example, financial documents such as statement of accounts, transaction records, etc. generated by the user during the business process of the bank are obtained. The financial document is converted into data in a predetermined format using a document parsing tool or otherwise a large language model for document parsing. And inputting the analyzed original financial data into a large language model for drawing data extraction, and carrying out feature extraction and analysis.

For example, in parsing a financial document, text cleansing may be performed with noise removal, special characters, etc., key information such as date, account holder, transaction record, etc., is identified using NLP (natural language processing) technology and document parsing algorithm, and text contents are extracted as structured data expressed in a predetermined format.

According to the embodiment of the disclosure, the efficiency of data collection and processing is remarkably improved by automatically acquiring and analyzing the financial document. In addition, by employing specification data conforming to a predetermined format, a high quality input is further provided for a large language model, thereby supporting more complex financial data processing.

Fig. 4 schematically illustrates a flowchart of obtaining raw financial data according to an embodiment of the present disclosure.

Each financial transaction is preconfigured with parsing rules. For example, a rule base may be predefined, and parsing rules corresponding to common financial services may be predefined in the system. The rule base contains the characteristics of the specific type of document, such as keywords, format modes and the like, and also contains the data format and key elements of the specific financial business.

As shown in fig. 4, this embodiment is one of the embodiments of operation S320, including:

in operation S410, a target parsing rule is determined according to the financial service corresponding to each financial document, where the parsing rule is used to obtain target service data in the financial document and includes predetermined format contents of the target service data;

Illustratively, the parsing rules include parsing methods and criteria preset for financial documents of a particular financial business for extracting target business data in the document and converting to a predetermined format. The parsing rules for bank statement include identifying transaction date, transaction type and amount; the resolution rules for the loan contract include identifying the loan amount, interest rate, and repayment plan. The target business data is data related to a specific financial business extracted from a financial document, such as transaction records in a bank statement, loan details in a loan contract, income situation in an investment report, and the like.

Illustratively, for a bank statement, a transaction record parsing rule is selected; for a loan contract, a loan detail resolution rule is selected.

In operation S420, parsing is performed based on the target parsing rule of each financial document, to obtain original financial data expressed in a predetermined format. For example, the financial document is parsed using the selected parsing rules, and the target business data is extracted and converted into a predetermined format.

According to the embodiment of the disclosure, by pre-configuring specific analysis rules for various financial services, the accuracy and consistency of document analysis are ensured, and by automatically selecting and applying the rules, different financial services are adapted, so that the efficiency of data processing is remarkably improved. In addition, the design allows the preconfigured parsing rules to support various types of financial documents, thereby enhancing the universality and applicability.

In some embodiments, the predetermined format characterizes transaction entities and transaction relationships between transaction entities in the raw financial data.

A transaction entity refers to all parties involved in a financial transaction, including individuals such as customers or account holders, organizations such as companies, banks or merchants, and intermediaries such as payment processing institutions or dealer, as well as other entities that may participate in the transaction. For example, the transaction entity may be "David" as the account holder, "ABC Bank" as the Bank, and "XYZ Store" as the merchant. The transaction relationship describes the flow of funds or transaction actions between these entities, such as payment, collection or loan. The representation of the transaction relationship and criteria include the source entity (the party initiating the transaction), the target entity (the party receiving the transaction), the transaction amount, the date of the transaction, and the transaction description, which together form a complete record of the transaction.

A field is defined in a predetermined format for representing detailed information of the transaction entity. A structure is defined in a predetermined format for representing a trade relationship between trade entities. In particular, the funds flow or transaction behavior between transaction entities is explicitly represented in a predetermined format, facilitating the structuring and analysis of data. As in the JSON format, nested structures are used to represent trade relationships, such as' transactions [ { "from": "David", "to": "tom", "amount": 100} ].

According to embodiments of the present disclosure, structuring and standardization of data is enhanced by explicitly representing transaction entities and their transaction relationships in a predetermined format, enabling large language models to identify and evaluate potential nodes and relationships.

Fig. 5 schematically illustrates a flow chart of obtaining refined graph data according to an embodiment of the disclosure.

As shown in fig. 5, this embodiment is one of the embodiments of operation S220, including:

in operation S510, inputting the features of the original nodes and the features of the original relations into a large language model;

Illustratively, the characteristics of the original node include properties or descriptive information of the original node, such as company name, industry category, etc. The characteristics of the original relationship include attributes or descriptive information of the original relationship, such as relationship type, association strength, relationship setup time, etc.

In operation S520, abstract nodes are extracted based on the features of the original nodes and abstract relationships are extracted based on the features of the original relationships through the large language model.

Such raw data can be analyzed in depth using the powerful analysis and processing capabilities of the large language model, for example, by entering the name of company a, characteristics of the original nodes such as industry, and characteristics of the original relationships such as the equity relationship between company a and company B. Abstract nodes can be summarized based on features of the original nodes, such as summarizing multiple company nodes into one industry node, refining more representative and generalized nodes reduces the complexity of the data. Similarly, the large language model can summarize abstract relations based on the characteristics of the original relations, for example, the share right relations among a plurality of companies are simplified into cooperation relations among industries, the relation network is simplified, and the relation and interaction among different entities are more clearly represented.

Illustratively, a large language model is used to generate a vector representation that is capable of capturing key information, resulting in features of the original nodes and features of the original relationships. In the node extraction stage, the nodes with the same importance or similar characteristics are combined based on the vector representation of the original nodes by using a large language model, so that the simplification and generalization of the nodes are realized. In addition, in the process of relational refinement, large language models are first utilized to screen and identify important relationships according to a series of criteria. These criteria include the strength of the relationship, such as by trading volume or share ratio to measure the importance of the relationship; frequency, i.e., how frequently and duration a relationship occurs, to identify a durable and stable relationship; and the type is used for classifying and summarizing the relationship according to the types of cooperation, competition or investment and the like.

According to the embodiment of the disclosure, the original graph data is refined by applying the large language model, so that the complex mode and the hidden structure can be found, and more accurate and valuable information support is provided for key links such as risk assessment, investment decision and the like in the financial field.

In some embodiments, in addition to inputting features of the original nodes and features of the original relationships into the large language model, obtaining the refined graph data further comprises: and inputting preset prompt words into the large language model, wherein the prompt words are used for indicating the large language model to follow node extraction requirements and relation extraction requirements matched with a preset financial scene.

The term of the prompt related to the present disclosure refers to prompt, specifically, text input to a large language model. A hint instruction may be understood as an input hint that tells the model what aspects need to be focused on and how to integrate that information into the output to hint or guide the large language model to give a consistent output.

The predetermined financial scenarios include specific financial application scenarios such as financial risk assessment, market prediction, portfolio analysis, etc. In a financial risk assessment scenario, the predetermined node refinement requirement may be "identify high risk companies" and the relationship refinement requirement may be "identify key investment relationships".

Illustratively, in the process of refining nodes and relations according to prompt words by using a large language model, first, feature data of original nodes and relations are collected and collated, and the prompt words are integrated with the data to form an input of the model. The input feature vectors and the prompt word vectors are converted into high-dimensional representations by utilizing an embedding layer of a large language model, and meanwhile, input features are deeply analyzed and processed by utilizing the context understanding capability of the model and combining with the information of the prompt words. For example, similar nodes are combined to generate abstract nodes that conform to the requirements of the hint word. In addition, the semantics of the hint words are used to filter the node characteristics, ensuring that nodes highly relevant to the hint words are preserved. And screening out important relations by analyzing the connection strength and the frequency through a large language model, classifying and filtering the original relations by utilizing the semantics of the prompt words, and extracting manage to find time relations meeting the requirements of the prompt words.

According to the embodiment of the disclosure, the abstract nodes and abstract relations are extracted under a specific financial scene by inputting the targeted prompt words and utilizing the large language model, so that the efficiency and the accuracy of data processing are improved. The prompt word enables the large language model to generate output meeting specific requirements, and pertinence and practicability of data analysis are enhanced.

Fig. 6 schematically illustrates a flowchart of obtaining raw graph data according to an embodiment of the present disclosure.

As shown in fig. 6, this embodiment is one of the embodiments of operation S220, including:

in operation S610, original graph features extracted from original financial data from a large language model are obtained, wherein the original graph features include feature vectors of at least part of data in the original financial data;

in operation S620, the original graph features are processed based on a preset financial rule to obtain original graph data, wherein the preset financial rule indicates at least one constraint condition for extracting features of the original nodes and features of the original relationships.

The original graph features include feature information of nodes and relationships extracted from the original financial data. Compared with the original image data, the original image features are vectorized representations directly extracted from the original financial data without being processed by a preset financial rule.

Such as constraints, are used for feature screening of the original nodes and feature screening of the original relationships. And processing and filtering the original graph characteristics according to the high-risk investment labels, the key stakeholder relation information and the like in the original financial data to obtain the original nodes and the original relations in the original graph data.

According to the embodiment of the disclosure, after the original graph characteristics are processed by utilizing the preset financial rules, the original graph data meeting the requirements can be obtained, and the pertinence and the practicability of data analysis are enhanced.

The graph data processing method of some embodiments is further described below with reference to fig. 7 and 8.

Fig. 7 schematically illustrates a flowchart of a graph data processing method according to another embodiment of the present disclosure.

In operation S700, a large language model is trained using a server.

First, the financial data sample is subjected to document splitting, cleaning, conversion and formatting to generate a data format suitable for large language model input. This includes converting the nodes, edges, and attribute information in the graph into vector representations for processing by the large language model. And generates a training set comprising a plurality of graph data. Next, a large language model is trained using a large amount of annotation data. And a proper optimization algorithm and a loss function are adopted to improve the training efficiency and performance of the model. Meanwhile, the training process of the model is accelerated by using high-performance computing equipment such as a GPU and the like. In the training process, the large language model automatically learns rules and features in the graph data, and deep understanding of the graph data is formed through repeated iterative optimization. After training, the large language model will have the ability to extract valuable information from the new graph data.

In operation S710, the financial document to be processed is subjected to document splitting, cleaning, conversion and formatting by using the terminal device, and a data format suitable for large language model input is generated, so as to obtain the original financial data.

In operation S720, financial graph data, that is, raw graph data and refined graph data, is extracted.

And extracting the characteristics by using the trained large language model. And taking the preprocessed original financial data as input, and inputting the input into a large language model. And extracting the characteristics of the input data through the trained large language model. The extracted features may include features of nodes, features of edges, overall structural features of the graph, and so forth. And obtaining the original nodes and the original relations, and the abstract nodes and the abstract relations based on the extracted features by using the trained large language model.

It should be noted that, although fig. 7 shows the execution subject of each step, the exemplary description is made only for the sake of easy understanding, and does not constitute a limitation on the execution subject of each step.

Fig. 8 schematically illustrates a flowchart of acquiring financial graph data according to an embodiment of the present disclosure.

In operation S810, the original financial data is vectorized using the large language model to obtain the original graph features.

In operation S820, a process is performed using a graph feature processing algorithm or a large language model.

Based on the characteristic representation capability of the large language model, the relation among the nodes, the structural relation of the graph and the semantic information in the graph are extracted. The specific implementation process comprises the following steps:

(1) Calculating node similarity: and calculating the similarity between the nodes by using the node characteristics extracted by the large language model. The calculation can be performed by adopting methods such as cosine similarity and Euclidean distance.

(2) Determining node relation: and determining the relation among the nodes according to the calculated node similarity. A certain threshold may be set, nodes with similarity higher than the threshold being regarded as relational, and nodes lower than the threshold being regarded as non-relational.

(3) Edge weights are calculated: and calculating weights among the edges by using edge features extracted by the large language model. The calculation can be performed by adopting methods such as cosine similarity and Euclidean distance.

(4) Determining structural relation: and determining the connection mode and connection strength of the edges according to the edge weights obtained through calculation. A certain threshold may be set, and edges with weights higher than the threshold are regarded as strong connection relationships, and edges lower than the threshold are regarded as weak connection relationships.

(5) Extracting semantic features: and extracting features related to the semantics by utilizing the node and edge features extracted by the large language model. These characteristics may include label information of the nodes, attribute information of the edges, and the like.

(6) Semantic information extraction: and extracting semantic information according to the extracted semantic features. May be implemented using rule-based methods, machine learning-based methods, or the like. For example, for user behavior data in a social network, semantic information such as interests, social circles, and the like of a user can be extracted by analyzing the behavior patterns and social relationships of the user.

In operation S830, original map data is obtained according to the processing result of operation S820.

In operation S840, the hint words, the original graph features, and the original graph data are input together into the large language model. The hint terms may contain background information of the question or instruction that helps the model to better understand the context when processing the graph data, making more accurate semantic understanding and reasoning. The extraction process can be adjusted according to different prompt words and context information, so that the model can be better adapted to different tasks and requirements, and the large language model can be used for more comprehensively understanding the input information, so that the accuracy of analysis and prediction is improved.

In operation S850, refined graph data is obtained according to the output result of the large language model in operation S840.

In operation S730, the graph loading is implemented by using the terminal device, and the knowledge graph is made. And integrating the original graph data and the financial graph data by utilizing a graph construction tool on the terminal equipment to construct a knowledge graph. In this graph, the original nodes, original relationships, abstract nodes, and abstract relationships are contained. The constructed knowledge-graph may be stored in a suitable database, such as a graph database or a triplet storage system, for subsequent query and analysis. For better understanding and analysis of knowledge maps, visualization tools can be used to graphically present the maps so that the connections between nodes and relationships are apparent.

In some embodiments, the original nodes and the original relations can be thinned, namely the abstract nodes and the abstract relations are used for replacing the original nodes and the original relations, so that the complexity of the atlas is reduced.

Compared with the traditional graph data extraction method, the large language model has excellent text understanding and generating capability, and can understand natural language, so that complex query and reasoning are simpler. Compared with the traditional method, the large language model has the following advantages:

1. Text understanding capabilities: the large language model is trained through a large amount of labeling data, deep features in the graph data can be learned, and therefore accuracy of extraction results is improved.

2. Context sensitivity: the large language model is able to understand the different meanings of words in different contexts, which is critical to accurately extracting entities and relationships. This context sensitivity allows the model to understand complex and ambiguous sentence structures.

3. Powerful generalization capability: these models can be well generalized to new, unseen data due to training over a large variety of financial data. This means that they can accurately perform entity and relationship extraction even in the face of text with complex structures or unusual expressions. With the continued addition of new annotation data, the large language model can be continually updated and optimized to accommodate new graph data structures and changes.

4. Flexibility: the graph data processing scheme using a large language model can cope with complex and changeable graph data structures.

Based on the graph data processing method, the disclosure also provides a graph data processing device. The device will be described in detail below in connection with fig. 9.

Fig. 9 schematically shows a block diagram of the structure of the image data processing apparatus according to the embodiment of the present disclosure.

As shown in fig. 9, the graph data processing apparatus 900 of this embodiment includes a data input module 910, a graph data extraction module 920, and a graph data processing module 930.

The data input module 910 may perform operation S210, configured to input raw financial data to be processed into a large language model, where the large language model is obtained by performing graph data processing training on financial data samples in advance;

The graph data extraction module 920 may perform operation S220 to obtain financial graph data extracted from the raw financial data by the large language model, wherein the financial graph data includes raw graph data including a data structure including raw nodes and raw relationships obtained from the raw financial data and refined graph data including a data structure including abstract nodes and abstract relationships refined based on the raw nodes and raw relationships;

The graph data processing module 930 may perform operation S230, configured to perform graph data processing according to the original graph data and the refined graph data, to obtain a graph data processing result.

For parts of the device not mentioned, it can be understood with reference to the various embodiments of the method described above. That is, the apparatus portion comprises means for performing the steps of any of the method embodiments described above, respectively. In addition, the implementation manner, the solved technical problems, the realized functions and the realized technical effects of each module/unit/subunit and the like in the apparatus part embodiment are the same as or similar to the implementation manner, the solved technical problems, the realized functions and the realized technical effects of each corresponding step in the method part embodiment, and are not described herein again.

Any of the data input module 910, the graph data extraction module 920, and the graph data processing module 930 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules according to embodiments of the present disclosure. Or at least some of the functionality of one or more of the modules may be combined with, and implemented in, at least some of the functionality of other modules.

At least one of the data input module 910, the graph data extraction module 920, and the graph data processing module 930 may be implemented, at least in part, as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware, such as any other reasonable manner of integrating or packaging circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware, according to embodiments of the present disclosure. Or at least one of the data input module 910, the graph data extraction module 920, and the graph data processing module 930 may be at least partially implemented as computer program modules that, when executed, perform the corresponding functions.

As shown in fig. 10, an electronic device 1000 according to an embodiment of the present disclosure includes a processor 1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. The processor 1001 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 1001 may also include on-board memory for caching purposes. The processor 1001 may include a single processing unit or multiple processing units for performing different actions of the method flows according to embodiments of the present disclosure.

In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 are stored. The processor 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. The processor 1001 performs various operations of the method flow according to the embodiment of the present disclosure by executing programs in the ROM 1002 and/or the RAM 1003. Note that the program may be stored in one or more memories other than the ROM 1002 and the RAM 1003. The processor 1001 may also perform various operations of the method flow according to the embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the disclosure, the electronic device 1000 may also include an input/output (I/O) interface 1005, the input/output (I/O) interface 1005 also being connected to the bus 1004. The electronic device 1000 may also include one or more of the following components connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc.; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 1002 and/or RAM 1003 and/or one or more memories other than ROM 1002 and RAM 1003 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to perform the methods provided by embodiments of the present disclosure.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1001. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of signals on a network medium, distributed, and downloaded and installed via the communication section 1009, and/or installed from the removable medium 1011. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1001. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

The embodiments of the present disclosure are described above. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. A graph data processing method, the method comprising:

inputting original financial data to be processed into a large language model, wherein the large language model is obtained by performing graph data processing training in advance through a financial data sample;

Obtaining financial graph data extracted from the original financial data by the large language model, wherein the financial graph data comprises original graph data and refined graph data, the original graph data comprises a data structure comprising original nodes and original relations obtained from the original financial data, and the refined graph data comprises a data structure comprising abstract nodes and abstract relations extracted based on the original nodes and the original relations;

and carrying out graph data processing according to the original graph data and the refined graph data to obtain a graph data processing result.

2. The method of claim 1, wherein obtaining financial graph data extracted from the raw financial data by the large language model comprises obtaining the refined graph data, comprising:

inputting the characteristics of the original nodes and the characteristics of the original relations into the large language model;

the abstract nodes are refined based on the features of the original nodes through the large language model, and the abstract relationships are refined based on the features of the original relationships.

3. The method of claim 2, wherein obtaining the refined graph data in addition to inputting the features of the original nodes and the features of the original relationships into the large language model comprises:

And inputting preset prompt words into the large language model, wherein the prompt words are used for indicating the large language model to follow node extraction requirements and relation extraction requirements matched with a preset financial scene.

4. A method according to any one of claims 1 to 3, wherein obtaining financial graph data extracted from the raw financial data by the large language model comprises obtaining the raw graph data, in particular comprising:

Obtaining original graph features extracted from the original financial data from the large language model, wherein the original graph features comprise feature vectors of at least part of the original financial data;

and processing the original graph characteristics based on a preset financial rule to obtain the original graph data, wherein the preset financial rule indicates at least one constraint condition for extracting the characteristics of the original nodes and the characteristics of the original relations.

5. The method of claim 1, wherein prior to inputting the raw financial data to be processed into the large language model, the method further comprises:

obtaining at least one financial document, wherein the financial document is generated in response to a user transacting financial business;

parsing the at least one financial document to obtain the raw financial data represented in a predetermined format.

6. The method of claim 5, wherein each financial transaction is preconfigured with parsing rules, parsing the at least one financial document to obtain the raw financial data represented in a predetermined format comprises:

Determining a target analysis rule according to the financial business corresponding to each financial document, wherein the analysis rule is used for acquiring target business data in the financial document and comprises preset format contents of the target business data;

parsing based on a target parsing rule of each financial document to obtain the original financial data expressed in a predetermined format.

7. The method of claim 5 or 6, wherein the predetermined format characterizes transaction entities and transaction relationships between transaction entities in the raw financial data.

8. A graph data processing apparatus, the apparatus comprising:

The data input module is used for inputting the original financial data to be processed into a large language model, wherein the large language model is obtained by carrying out graph data processing training through financial data samples in advance;

a graph data extraction module for obtaining financial graph data extracted from the original financial data by the large language model, wherein the financial graph data comprises original graph data and refined graph data, the original graph data comprises a data structure comprising original nodes and original relations obtained from the original financial data, and the refined graph data comprises a data structure comprising abstract nodes and abstract relations extracted based on the original nodes and the original relations;

And the graph data processing module is used for performing graph data processing according to the original graph data and the refined graph data to obtain graph data processing results.

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more computer programs,

Characterized in that the one or more processors execute the one or more computer programs to implement the steps of the method according to any one of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, realizes the steps of the method according to any one of claims 1-7.

11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, realizes the steps of the method according to any one of claims 1-7.