CN118394855A

CN118394855A - Data processing method, device, equipment and storage medium

Info

Publication number: CN118394855A
Application number: CN202410313759.9A
Authority: CN
Inventors: 陈睿; 王淑君; 杨韬; 彭小勇; 青焓; 林岸森
Original assignee: Futuo Network Technology Shenzhen Co ltd
Current assignee: Futuo Network Technology Shenzhen Co ltd
Priority date: 2024-03-19
Filing date: 2024-03-19
Publication date: 2024-07-26

Abstract

The application discloses a data processing method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring original data from a service data source; extracting entity relation data from the original data, wherein each entity relation data comprises corresponding entity attribute data, a subject identifier and an object identifier; for any piece of entity relation data, a subject identifier and an object identifier in the entity relation data are used as a combined main key, and an entity relation table is generated based on the entity relation data; storing the entity relation table in a target database; and responding to the entity query request, and acquiring target entity relationship data corresponding to the entity query request from a target database. The application can support a brand new data storage structure, obtain more dimensional relation description, improve the timeliness of data and improve the data query rate.

Description

Data processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and storage medium.

Background

With the advent of the big data age, the description and management of relational data has become particularly important. Numerous data products for describing relationships are presented on the market, but in practical applications, these products generally suffer from the following problems:

in current relational data products, objects are typically stored in the form of lists (lists). When reading data, it is necessary to parse these lists in memory. When concurrent requests increase, the system needs to process a large amount of data reading and parsing, and there may be a risk of memory overload.

Many relational data products employ a large wide table structure to store data. This approach may result in data tilting when processing certain relational data, which in turn affects query speed.

Current data products typically use a fixed data format, which makes in-depth description of the additional properties of the object difficult, possibly resulting in incomplete description of the relationships.

The application programming (Application Programming Interface, API) interface of many data products has limited functionality, supporting only simple querying and data retrieval. The downstream service has single form when being called, and cannot realize complex functions of data screening, statistics and the like. Sometimes, to meet the demand, the business side may even use the API interface erroneously, bringing unnecessary stress to the system.

Thus, current relational data products present challenges in handling large-scale data, and are difficult to meet with ever-increasing large data processing demands.

Disclosure of Invention

The embodiment of the application provides a data processing method, a device, equipment and a storage medium, which can support a brand new data storage structure, obtain more-dimensional relation description, improve the timeliness of data and improve the data query rate.

In one aspect, an embodiment of the present application provides a data processing method, including:

acquiring original data from a service data source;

Extracting entity relation data from the original data, wherein each entity relation data comprises corresponding entity attribute data, a subject identifier and an object identifier;

For any piece of entity relation data, taking a subject identifier and an object identifier in the entity relation data as a joint main key, and generating an entity relation table based on the entity relation data;

storing the entity relation table in a target database;

And responding to the entity query request, and acquiring target entity relationship data corresponding to the entity query request from the target database.

In another aspect, an embodiment of the present application provides a data processing apparatus, including:

The acquisition unit is used for acquiring the original data from the service data source;

The extraction unit is used for extracting entity relation data from the original data, wherein each entity relation data comprises corresponding entity attribute data, a subject identifier and an object identifier;

the first storage unit is used for generating an entity relation table based on any piece of entity relation data by taking a subject identifier and an object identifier in the entity relation data as a combined main key;

a second storage unit, configured to store the entity relationship table in a target database;

And the query unit is used for responding to the entity query request and acquiring target entity relationship data corresponding to the entity query request from the target database.

In another aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, where the memory stores a computer program, and the processor is configured to execute the data processing method according to any one of the embodiments above by calling the computer program stored in the memory.

In another aspect, embodiments of the present application provide a computer readable storage medium storing a computer program adapted to be loaded by a processor to perform a data processing method according to any of the embodiments above.

In another aspect, an embodiment of the present application is a computer program product comprising computer instructions which, when executed by a processor, implement a data processing method as described in any of the embodiments above.

According to the embodiment of the application, the original data is acquired from the service data source; extracting entity relation data from the original data, wherein each entity relation data comprises corresponding entity attribute data, a subject identifier and an object identifier; for any piece of entity relation data, a subject identifier and an object identifier in the entity relation data are used as a combined main key, and an entity relation table is generated based on the entity relation data; storing the entity relation table in a target database; and responding to the entity query request, and acquiring target entity relationship data corresponding to the entity query request from a target database. According to the embodiment of the application, aiming at any piece of entity relation data, the entity relation table is generated based on the entity relation data by taking the subject identification and the object identification in the entity relation data as the combined main key, and the entity relation table is further stored in the target database, so that a brand new data storage structure can be supported, and the organization of the data is more reasonable and efficient; each entity relation data comprises corresponding entity attribute data, a subject identifier and an object identifier, and the data structure can provide more dimensional relation description, so that the entity relation in the data can be more comprehensively understood; by storing the entity relation data in the target database, the real-time updating and synchronization of the data can be realized, the timeliness of the data is improved, and a user can acquire the latest data information in time; when the entity query request is received, the target entity relationship data corresponding to the entity query request can be directly and rapidly obtained from the entity relationship table in the target database, so that the data query rate is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a data processing method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a first application scenario provided in an embodiment of the present application.

Fig. 3 is a schematic diagram of a second application scenario provided in an embodiment of the present application.

Fig. 4 is a schematic diagram of a third application scenario provided in an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.

The embodiment of the application provides a data processing method, a device, equipment and a storage medium. Specifically, the data processing method of the embodiment of the present application may be performed by a computer device, where the computer device may be a terminal or a server. The terminal can be smart phones, tablet computers, notebook computers, desktop computers, smart televisions, smart speakers, wearable smart devices, smart vehicle-mounted terminals and other devices, and also can comprise a client, wherein the client can be a financial client, a browser client or an instant messaging client and the like. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content distribution network services, basic cloud computing services such as big data and an artificial intelligence platform, but is not limited thereto.

In an internet system, a subject may establish various associations with a plurality of objects (e.g., information posts, financial targets, other users, etc.). Entity relationship data refers to data of a certain relationship generated when a subject (person or article) interacts with an object (person or article), and the relationship data supports description of other attributes attached to the relationship.

Taking a simple bin holding relationship as an example: in the data, a 'holding relation' is generated between a subject (user) and an object (stock), and the object (stock) supports further description on the relation through attributes (holding quantity and holding multiple blank types).

Wherein, the object generally refers to an object existing objectively, such as stocks and posts, which can be distinguished by object Identification (ID), such as stock ID and post ID.

Entity relationship generally refers to a relationship between a subject and an object, such as a user holding stock, i.e., a holding stock relationship, and a user may be a one-to-many relationship for a certain object, such as a user holding multiple stocks.

In describing the relationship between a single subject and a plurality of objects, since the number of objects each subject binds with the subject under a single relationship is uncertain. Some subjects may have multiple objects, some subjects only have one object, and data tilting and memory resolution pressure surge may occur in large data scenarios with huge data volumes.

In order to better adapt to the data application scene with large concurrency and large request amount in the large data age, the embodiment of the application provides a data processing method, wherein a subject identifier and an object identifier in entity relationship data are used as a combined main key for any piece of entity relationship data, an entity relationship table is generated based on the entity relationship data, and the entity relationship table is further stored in a target database, so that a brand new data storage structure can be supported, and the organization of data is more reasonable and efficient; each entity relation data comprises corresponding entity attribute data, a subject identifier and an object identifier, and the data structure can provide more dimensional relation description, so that the entity relation in the data can be more comprehensively understood; by storing the entity relation data in the target database, the real-time updating and synchronization of the data can be realized, the timeliness of the data is improved, and a user can acquire the latest data information in time; when the entity query request is received, the target entity relationship data corresponding to the entity query request can be directly and rapidly obtained from the entity relationship table in the target database, so that the data query rate is improved.

The following will describe in detail. It should be noted that the following description order of embodiments is not a limitation of the priority order of embodiments.

Referring to fig. 1 to fig. 4, fig. 1 is a flowchart of a data processing method according to an embodiment of the present application, and fig. 2 to fig. 4 are application scenario diagrams according to an embodiment of the present application. The method comprises the following steps 110 to 150:

At step 110, raw data is obtained from a service data source.

As shown in fig. 2, raw data may be acquired from a service data source, and then the raw data may also be stored to a data import layer (Operation Data Store, ODS) for a number of bins.

The service data source generally contains various data related to the service, and the data can be structured or unstructured. Raw data is obtained from these traffic data sources for the purpose of subsequent data processing and analysis.

For example, the business data sources may come from different locations, such as databases, data stores, file systems, API interfaces, and the like. Such raw data may include various types of information such as text, numbers, images, etc.

To ensure accuracy and integrity of the data, the raw data needs to be obtained from a reliable, validated source of business data. In addition, timeliness and frequency of data need to be considered, so that latest data can be acquired in time.

In the process of acquiring the original data, the type and the characteristics of the service data source need to be clarified first. The data sources of different business fields and scenarios can vary widely. For example, data sources in the financial domain may include transaction records, quotation data, user behavior data, and the like; and the data sources in the e-commerce field may include commodity information, order data, user evaluation, and the like. Knowing the characteristics of the data source helps to better formulate the data acquisition strategy.

For example, for a financial scenario, raw data acquired from a business data source may include transaction topic data, quotation topic data, social topic data, and the like. Such data may come from sources of different business scenarios, such as trading systems, ticketing systems, social media, and the like. For example, the transaction topic data may include data information such as the amount, time, and opponents of a transaction performed by a user for a financial product (i.e., a financial target); the market theme data may relate to market price and trend data information of financial products such as stocks, foreign exchange or commodities; the social topic data may relate to data information of the actions of attention among users, attention of users to different information topics, comments, praise and the like.

Second, cleaning and finishing are required for raw data acquired from a service data source. Because the original data may have incomplete, repeated, abnormal and other problems, preprocessing is needed to improve the data quality. This includes, but is not limited to, deduplication, outlier handling, missing value padding, format unification, and the like. By means of these preprocessing steps, the accuracy and reliability of the subsequent data processing and analysis can be ensured.

Furthermore, in order to better manage and organize the raw data, the raw data is typically stored on some kind of data storage medium. And a proper data storage mode can be selected according to actual requirements and scenes. In some cases, such data may be stored in a conventional relational database; in other cases, to better support large data processing and analysis, it may be selected to store the data in a distributed storage system (e.g., HDFS) or in a columnar storage database. In many cases, raw data can be stored in the ODS number bin, and centralized management and efficient access of data can be achieved.

And 120, extracting entity relation data from the original data, wherein each entity relation data comprises corresponding entity attribute data, a subject identifier and an object identifier.

Wherein, the extraction of the entity relation data depends on the characteristics and the structure of the data. Generally, entity relationship data refers to records or information that contain relationships between entities. In many application scenarios, entity relationship data has complexity and diversity, and needs to be extracted by adopting proper methods and technologies.

In order to accurately extract entity relationship data, it is necessary to fully understand the characteristics and structure of the original data. For example, in the financial arts, raw data may include data of transaction records, stock quotes, market trends, social information, and so forth. In these data, entities may include user-targeted, financial market, industry/plate-targeted, information themes, etc., while relationships between entities may include trade relationships, warehouse-keeping relationships, attention relationships, comment relationships, etc., associations; for example, the relationship between the user object and the finance object may include a trade relationship (e.g., a finance object that the user a has traded), a holding relationship (e.g., a finance object that the user a has held), etc., the relationship between the user object and the finance market or industry/board object may include a concern relationship (e.g., a finance market or industry/board object that the user B has concerns), a holding relationship (e.g., a finance market or industry/board object that the finance object has held), etc., and the relationship between the user object and the user object may include a concern relationship, a praise relationship, etc., which are not described in detail herein.

The entity attribute data, the subject identifier and the object identifier are core components of entity relationship data. The entity attribute data may be used to describe characteristics and attributes of the entity, such as a name, a type, an attribute value, etc., and, taking the application in the financial field, the entity being a user target, and the entity relationship being a holding relationship as an example, the attribute value of the entity attribute data may include a financial market to which the financial target belongs, a number of bins of the user target to the financial target, a transaction time of the purchase amount target of the user target, etc. The subject and object identifiers may be used to identify two entities in an entity relationship, which is represented as a subject-object relationship or a relationship between an action and an object affected by the action.

The process of extracting entity relationship data generally involves the steps of data cleaning, entity identification, attribute extraction, relationship extraction and the like. The data cleansing is to eliminate erroneous and inconsistent data, and to ensure data quality. Entity identification is the identification of entity objects in raw data, such as person names, place names, organization names, event names, etc., and more specifically, when applied in the financial field, entity objects may be specifically user-specific, financial market, industry/plate-specific, etc. Attribute extraction is the extraction of entity attribute data associated with these entities, such as the type of entity, the attribute value of the entity, etc. The relationship extraction is to extract the association relationship between the entities according to the pattern and the relationship in the data, and the association relationship can be various, such as containing relationship, belonging relationship, transaction relationship and the like. To describe these associations, the system assigns a subject identifier and an object identifier to each association. The subject identifier typically represents the initiator or active party of the relationship, while the object identifier represents the recipient or passive party of the relationship.

The embodiment of the application can design the data structure of the entity relationship data by adopting the mode that the entity identifier (subject identifier) and the entity identifier (object identifier) are combined with the main key, and store the corresponding data structure into the entity relationship table, thereby conveniently realizing the data filtering based on the subject identifier, the object identifier and the entity attribute data in various databases.

Further, after the original data is obtained, logic calculation may be performed on the original data to extract entity relationship data from the original data.

In some embodiments, the extracting entity relationship data from the raw data includes: extracting entity objects from the original data, and further determining association relation information matched with the entity object pairs from preset entity relation rules based on the entity object pairs for any pair of entity object pairs; and determining a subject identifier and an object identifier in the entity relation data according to the association relation information and the entity object pair, and extracting entity attribute data corresponding to the entity object pair from the original data to obtain the entity relation data.

The entity relation rule records association relation information among different entities, wherein the association relation information comprises, but is not limited to, an object type of a subject object with entity relation, an object type of an object, a data type of entity attribute data and the like; taking the application to the financial field as an example, the entity relationship rule may record association relationship information of entity relationships such as "a financial market where a corresponding financial account of a user target is located", "a financial target held by the user target", "an information topic of interest of the user target", and the like, where each association relationship information identifies an object type of a subject object, an object type of an object, and/or a data type of entity attribute data in the corresponding entity relationship, for example, in the entity relationship rule "a financial market where a corresponding financial account of the user target is located", the object type of the subject object is "the user target", and the object type of the object is "the financial market"; for another example, in the entity relationship rule "the finance target held by the user target", the object type including the subject object is "user target", the object type of the object is "finance target", and the entity attribute data may include an attribute value of the object "finance target", which may be a specific target of the finance target held, a number of bins, a direction of multiple bins, and the like.

After the original data is obtained, all entity objects and object types of all entity objects can be identified and obtained from the original data through an entity identification algorithm; further, any two entity objects are combined to determine an entity object pair, and for the entity object pair, the object type of the entity object in the entity object pair can be matched with the object type of the entity object recorded by each association relation information in a preset association relation rule based on the object type of the entity object in the entity object pair so as to obtain the entity object pair with the entity relation; it can be understood that if the object type of the entity object in the entity object pair matches the object types of the subject object and the object in a certain association information, the entity object pair has an entity relationship corresponding to the association information.

In one embodiment, the object type of the entity object recorded by the association relationship information can be specifically used as a first node set according to the object type of the entity object in the entity object pair; constructing an entity bipartite graph according to the first node set and the second node set, and acquiring a weight value of each edge in the entity bipartite graph according to the similarity between any two nodes in the first node set and the second node set; and determining association relation information matched with the entity object pair based on the weight value of each edge in the bipartite graph.

Specifically, the object type of the entity object in the entity object pair can be used as one node of the first node set, and the object type of the entity object recorded by the association relation information can be used as one node of the second node set; furthermore, the weight value of each edge in the entity bipartite graph can be obtained according to the following formula:

Where A _i represents the ith node of the first set of nodes and B _j identifies the jth node of the second set of nodes, i.ltoreq.2, j.ltoreq.2. When the weight value W _i,j of the edge with the preset first threshold value in the entity bipartite graph is larger than the preset second threshold value, determining that the entity object pair corresponding to the first node set is matched with the association relation information corresponding to the second node set, namely, the entity object pair has the entity relation corresponding to the association relation information.

After determining that the entity is related to the entity object pair and the corresponding association relation information, determining the corresponding subject object and object in the entity object pair according to the association relation information so as to acquire the subject identifier of the subject object and the object identifier of the object from the original data, and extracting the entity attribute data corresponding to the subject object or the object from the original data based on the data type of the entity attribute data recorded in the association relation information.

The needed entity relation data can be quickly extracted from the original data through the logic calculation, and the data can be written into the corresponding entity relation table according to different entity relations. Referring to fig. 2, for different service scenarios, original data can be obtained from service data sources in different service scenarios, and stored into ODS number bins of corresponding topics of different service scenarios; furthermore, the original data in the ODS number bin under any subject can be consumed in real time through a message middleware (such as kafka), a plurality of entity objects and object types of the entity objects can be identified and obtained from the original data, so that entity object pairs with entity relationships are determined based on entity relationship rules and the entity objects under the subjects, further, according to the determined entity object pairs, the subject identification and the object identification in the entity relationship data are determined from the entity object pairs, and the entity attribute data corresponding to the entity object pairs are extracted from the original data in the ODS number bin of the corresponding subject, so that the entity relationship data are obtained. For example, taking the financial field as an example, the entity relationship rule may record association relationship information of an entity relationship of "a financial market to which a financial target held by a user belongs", wherein a subject object is "a user target", and an object is "a financial market"; after entity identification is carried out on the original data in the ODS number bin under any theme, the entity object pair 'user target' and 'financial market' with the entity relation are determined based on the entity relation rule, the entity attribute data of the specific financial target held under the specific user target is obtained from the ODS number bin of the transaction theme based on the specific entity identifier corresponding to the entity object 'user target', and the specific financial market to which the specific financial target belongs is searched from the ODS number bin under the market theme based on the specific financial target, so that the specific object identifier of the object is obtained.

In some embodiments, the extracting entity relationship data from the raw data includes:

And extracting a plurality of entity relation data from the original data based on a streaming computing mode.

In some embodiments, the extracting a plurality of entity relationship data from the raw data based on the streaming computing manner includes:

dividing the entity computing task into a plurality of entity computing sub-tasks;

Dividing the original data into a plurality of original sub-data, and distributing each original sub-data in the plurality of original sub-data to different entity computing sub-tasks for processing so as to extract a plurality of entity relation data.

Streaming computing is a computing method for processing real-time data streams, and unlike traditional batch processing methods, streaming computing can process and respond to data streams in real-time. In the extraction of the entity relationship data, the streaming computing mode can rapidly extract the entity relationship data from a large amount of original data, and provides real-time data processing and analysis capability.

In a specific implementation, the step of extracting the entity relationship data from the original data may be implemented in a streaming computing manner, and the task of extracting the entity relationship data may be divided into a plurality of entity computing sub-tasks. Each sub-task can independently process a part of original data and extract entity relation data in the processing process. In this way, the processing load of the original data can be dispersed to a plurality of computing subtasks, and the speed and efficiency of data processing are improved.

Further, the original data may be divided into a plurality of original sub-data, and the original sub-data may be assigned to different entity computing sub-tasks for processing. Each subtask only processes a part of original subtasks, so that the speed and efficiency of data processing are further improved. Meanwhile, the data volume processed by each subtask is smaller, so that errors and abnormal conditions in the data processing process can be reduced.

By the embodiment, the entity relation data is extracted from the original data based on the streaming computing mode, a large amount of original data can be processed rapidly and accurately, real-time data processing and analyzing capability is provided, and extraction efficiency and accuracy of the entity relation data are improved.

And 130, aiming at any piece of entity relation data, generating an entity relation table based on the entity relation data by taking a subject identifier and an object identifier in the entity relation data as a combined main key.

For example, the entity relationship data is that the feature values of the calculated entity relationship data need to be stored in the entity relationship table by pre-calculation in combination with the service data. These feature values are used to describe details of the attributes and associations of the entities. For example, the feature value may include a data detail value, and different data detail values are calculated according to different entity relationship data, for example, in the entity relationship data of "FutuHK clients currently holding the bin target", the data detail value may be a specific bin target, a number of bins to be held, a direction of multiple empty bins to be held, and the like.

Different entity relationship data usually have different attributes and association relationships, so that an independent entity relationship table needs to be created for each entity relationship data to ensure classification and specialized storage of the data, and meanwhile, the data query and processing efficiency is improved.

For example, entities also have their own attributes, and different entity attribute data may also be different. For example, the entity holding stock also includes the stock quantity holding stock, and the embodiment of the application stores different entity relationship data in different entity relationship tables, so that the characteristic can be conveniently supported, and the business can query based on the entity attribute data.

For example, the raw data in the ODS number bin can be consumed in real time by message middleware (such as kafka) in the computation layer as shown in fig. 2 to perform logical computation. The required entity relation data can be extracted from the original data through logic calculation, and the data is written into the corresponding entity relation table according to different entity relations. For example, each specific entity relationship data corresponds to a specific entity relationship table in the target database, and the entity relationship data generated by the entity calculation task is written into the specific entity relationship table.

For example, when designing the entity relation table, considering that the data of the entities are different among different entities, in order to save the storage resources to the greatest extent, the data can be stored in a form of one entity and one table, and each entity table only needs to pay attention to the data of the corresponding entity. For example, "user a" - "stock a held by user a" may be stored as one entity relationship data in one entity relationship table; the number of shares a held by the user a may be stored in different columns of the same entity-relationship table.

Wherein the ODS number bin can act as an intermediate layer of data integration, and can extract, convert, and load (ETL) data from multiple source systems to ensure consistency and accuracy of the data. The ODS number bin typically includes a data cleansing and verification process to clear invalid data, process missing values, and correct data errors. The ODS count bin can hold historical data, allowing analysts and business users to view the evolution and changes of the data.

Among other things, kafka is a high-throughput distributed publish-subscribe messaging system that can handle all action flow data for consumers in a web site.

For example, the entity relationship table corresponding to each piece of entity relationship data may be determined according to at least one of the original data source, the original data subject, the association relationship information and the entity attribute data. For example, according to the information such as the original data source (source of different business scenes), the original data theme, the association relationship information, the entity attribute data and the like, the entity relationship table of which business scene the entity relationship data should belong to can be judged. For example, at least one piece of entity relationship data is obtained by analyzing raw data corresponding to one or more business scenes in the ODS number bin, and the obtained entity relationship data needs to be stored in which entity relationship table is determined by logic calculation.

For example, if entity relationship data is generated based on transaction topic data, it may be determined from the original data topic that the entity relationship data should be written into a transaction entity relationship table; if the entity relationship data is generated based on the quotation topic data, determining that the entity relationship data should be written into the quotation entity relationship table according to the original data topic; if the entity relationship data is generated based on social topic data, it may be determined from the original data topic that the entity relationship data should be written into a social entity relationship table; and so on.

For example, if the association relationship information is a holding relationship and one attribute in the corresponding entity attribute data is called "financial market where the holding target is located", the data field of the original data corresponding to the entity relationship data (that is, "transaction topic", "quotation topic", etc. in the ODS number bin) includes holding (transaction topic is related) and quotation (quotation topic is related), and the subject identifier obtained from the entity relationship data is "holding target", and the object identifier is "financial market", where the priority of the entity relationship table corresponding to the subject identifier may be greater than the priority of the entity relationship table corresponding to the object identifier, and therefore, for the entity relationship data whose content is "financial market where the holding target is located", the corresponding entity relationship table may be the transaction topic relationship table.

Wherein, for any piece of entity relationship data, there are one or more subjects in each entity relationship data, and these subjects usually have unique identifications. These principal identifications are part of the primary keys in the entity relationship table. Corresponding to the main body, each entity relationship data also comprises one or more objects, and the objects also have unique identification. These object identifications are also part of the primary keys in the entity relationship table. The subject identifier and the object identifier are combined together to form a joint primary key. This federated primary key may ensure that each record is unique in the entity-relationship table.

Once the federated primary key is determined, a table structure may be created from the entity-relationship data. This table structure should be able to store all necessary fields such as subject, object, relationship type, relationship description, time stamp, entity attribute data, etc.

For example, the entity relationship table takes a transaction entity relationship table as an example, for example, if entity relationship data is bin holding relationship data, the table structure can be designed to take user identification (User Identification, UID) and bin holding targets as combined main keys, so that the entity relationship data of each bin holding relationship can be uniquely identified according to the UID and the bin holding targets, and subsequent data query and analysis are facilitated. In the table, fields such as the number of the holding bins, the direction of the holding bins and the like can be contained so as to completely describe the detailed information of the holding bin relation. The advantage of this design is that can be through the quick location of UID to the storehouse mark that holds of specific user, through holding the storehouse mark to know the information such as the specific thing that holds the storehouse mark and the corresponding storehouse quantity of holding, many empty directions simultaneously. This structured storage makes the querying and use of data more convenient and efficient.

The embodiment of the application can create different entity relation tables according to different entity relation data so as to store the entity relation tables according to the need and conveniently support the entity attribute data.

And 140, storing the entity relation table in a target database.

In some embodiments, the target database comprises a first target database and a second target database;

the storing the entity relationship table in a target database includes:

and simultaneously storing different entity relation tables in the first target database and the second target database, wherein the first target database is a relation type database, and the second target database is a cloud primary data warehouse.

For example, the target database may include two or more sub-databases or data stores, such as a first target database and a second target database. This design allows different types of entity-relationship tables to be distributed into different sub-databases for better performance optimization and data management.

And storing the entity relation table in the first target database and the second target database simultaneously, so that redundant backup and distributed storage of data can be realized. The distributed storage can improve the usability and expandability of the data and ensure the safety and the integrity of the data. At the same time, this design also allows for quick retrieval and access of data from different sub-databases according to different query and business requirements.

For example, the first target database may be a relational database built based on the relational database management system (MySQL), and the second target database may be a cloud primary data warehouse (ByteHouse).

For example, as shown in FIG. 2, an entity-relationship table storing entity-relationship data may be stored in a target database corresponding to a storage tier, which may include a relational database and a cloud-primary data warehouse (ByteHouse) constructed based on a relational database management system (MySQL).

MySQL is a relational database management system that keeps data in different tables rather than placing all data in one large warehouse, which increases speed and flexibility.

ByteHouse is a cloud primary data warehouse, which provides a very fast analysis experience for users and can support real-time data analysis and mass data offline analysis. The convenient elastic capacity expansion and contraction capability, extremely analysis performance and rich enterprise-level characteristics, and assist clients in digital transformation.

The embodiment of the application stores data in MySQL and Bytehouse simultaneously, and can support the business requirements of online transaction processing (On-Line Transaction Processing, OLTP) and online analysis processing (Online Analytical Processing, OLAP).

And step 150, responding to the entity query request, and acquiring target entity relationship data corresponding to the entity query request from the target database.

For example, when a system receives an entity query request, the request will typically contain certain query conditions, such as the type of entity, the attributes of the entity, the type of relationship between the entities, the time frame of the query, and so forth. The system needs to parse these query conditions to understand the true intent of the query request. Then, the system constructs a corresponding query statement or query logic according to the parsed query condition. This query statement or query logic is sent to the target database for data retrieval in the stored entity-relationship table. In the target database, entity relationship tables are stored in a certain structural and organizational manner, and generally contain various attribute information of entities and relationship information among the entities. The system can perform operations such as screening, connection, sorting and the like on the data in the entity relation tables according to the query statement or the query logic so as to find target entity relation data meeting the query condition.

In some embodiments, the obtaining, in response to an entity query request, target entity relationship data corresponding to the entity query request from the target database includes:

If the entity query request is a first entity query request corresponding to online transaction processing, responding to the first entity query request, and acquiring target entity relationship data corresponding to the entity query request from the first target database or the second target database; or alternatively

And if the entity query request is a second entity query request corresponding to online analysis processing, responding to the second entity query request, and acquiring target entity relationship data corresponding to the entity query request from the first target database and the second target database.

For example, entity queries mainly include two capabilities of online transaction processing (On-Line Transaction Processing, OLTP) and online analytical processing (Online Analytical Processing, OLAP), and business can flexibly query and filter according to entity data through a structured query language (Structured Query Language, SQL) manner.

Online transaction (On-Line Transaction Processing, OLTP): the query is mainly based on the primary key, and quick response is required, and throughput and response time are concerned.

On-line analytical processing (Online Analytical Processing, OLAP): the complex cross-table association, aggregation, sequencing and other queries are mainly performed, and the associated data are usually required to be processed to generate a complex report and an analysis result.

For example, OLTP is primarily concerned with real-time data transactions, such as add, delete, and modify operations, and needs to ensure data consistency and transaction capabilities. Thus, for this type of query request, the system may take priority to retrieving data from the first target database. The first target database is typically designed to support high concurrent transactions and low latency data accesses to meet the requirements of OLTP. The data may also be from the first target database or the second target database, the specific source depending on the availability of the data and the configuration of the system.

For example, OLAP is primarily used for complex cross-table association, aggregation, ranking, and other query operations that typically involve extensive data analysis and report generation. Thus, for an OLAP type query request, the system will obtain data from both the first target database and the second target database. Such a strategy may take advantage of the two databases. For example, a first target database provides real-time or near real-time data analysis capabilities, while a second target database provides longer-term historical data and analysis views. This combination can meet the comprehensiveness and real-time requirements of the data analysis.

For example, OLAP is mainly used for crowd-sourced or data statistical analysis, supports complex analysis operations, focuses on decision support, and provides intuitive and understandable query results. Such as a need to query the user as to which stocks are currently held in the stock, or to filter out which users are holding stock to make stock.

For example, in OLAP analysis, it is sometimes necessary to perform joint analysis on multiple entities at the same time, such as a user who needs to count a stock and pay attention to a certain stock, because the stock holding entity and the stock paying attention to entity belong to different entities and are stored in different tables. When storing the entity relationship data, the entity relationship table storing the entity relationship data can also be stored in the first target database and the second target database for joint analysis query.

In some embodiments, the obtaining, in response to the first entity query request, target entity relationship data corresponding to the entity query request from the first target database or the second target database includes:

And acquiring target entity relation data corresponding to the entity query request from the first target database or the second target database according to the primary key information carried in the first entity query request, wherein the primary key information comprises a first main body identifier and a first client identifier.

When data is acquired from the target database in response to the entity query request, the processing mode and the data source can be different for different query types. Specifically, for a first entity query request corresponding to online transaction processing (OLTP), the system obtains corresponding target entity relationship data from the first target database or the second target database according to primary key information in the query request, where a joint primary key of the target entity relationship data matches primary key information in the query request.

The primary key information plays a key role in the OLTP query. It generally includes a first principal identification and a first customer identification for quick locating and retrieving specific entity relationship data. By using the primary key information for filtering and searching, the system can quickly and accurately respond to the query request of the entity.

For example, a user may want to query his stock data or transaction records that are taken from his stock. By using the user ID as the first principal identification and the stock code as the first customer identification, the system is able to quickly retrieve the corresponding data in the database.

In addition, for entity relationship data that no longer exists, the system marks the corresponding deletion time. For example, when a user clears a stock, the user's hand-off record no longer exists in relation to that stock. At this point, the system will mark the deletion time as the time of the purge to indicate that the entity relationship data has failed or is no longer applicable. This marking of the deletion time helps to maintain accuracy and consistency of the data. It ensures that the system does not return outdated or invalid data, providing accurate query results to the user.

During the process of retrieving data, the system may also filter based on other conditions, such as deletion time. If the recorded deletion time is 0, the entity relation data is valid currently; if the deletion time is greater than 0, the entity relation data exist at one time but are invalid currently; if the filtering result is null, it indicates that the entity relationship data never exists. Such deletion time based filtering mechanisms are useful for processing entity relationship data that changes over time, such as user's taken records or transaction history.

In some embodiments, the responding to the second entity query request, obtaining target entity relationship data corresponding to the entity query request from the first target database and the second target database, includes:

And responding to the association information carried in the second entity query request, and acquiring target entity relationship data corresponding to the entity query request from the first target database and the second target database, wherein the association information comprises at least one second subject identifier and at least one second object identifier.

When the system receives a query request based on association information, such as an online analytical processing (OLAP) request, target entity relationship data corresponding to the entity query request needs to be obtained from a plurality of target databases. This is because association information queries typically involve many-to-many relationships, requiring comprehensive analysis of the data of multiple databases.

The association information may include at least one second subject identifier and at least one second object identifier, which are key for implementing the complex entity relationship data query. Such information is used to represent complex relationships between different entities, such as users and products, suppliers and orders, etc., to facilitate the system in retrieving and integrating relevant data from multiple target databases.

The system can simultaneously retrieve related target entity relationship data from the first target database and the second target database according to the subject and object identifications in the association information. Because of the large and complex amount of data involved in the associated information query, it is necessary to comprehensively obtain data from multiple target databases to obtain complete and accurate analysis results. The retrieved target entity relationship data is further integrated and analyzed to generate a report, a graph, a dashboard, or the like for presentation to the user. The analysis results can help business users to know the relationship and trend among the entities in depth, and support decision making.

For example, on an e-commerce platform, a user may wish to query purchase records and user ratings associated with a certain item. The system will simultaneously retrieve relevant purchase records, user information, rating content, etc. from the database based on the item ID and other associated information. From these data, purchasing habits, evaluation preferences, association relation with commodities, and the like of the user can be analyzed.

For example, in a financial scenario, a user wants to query his/her current stock taken as well as stocks that have been of interest. For example, the first target database stores entity relation data corresponding to stock information of a user holding a stock, including a user ID (subject identifier), a stock code (object identifier), a number of holding a stock (entity attribute data), a holding time (entity attribute data), and the like; the second target database records entity relationship data corresponding to stock information that the user has focused on, including a user ID (subject identifier), a stock code (object identifier), a focused time (entity attribute data), a browsing number (entity attribute data), and the like. For example, the association information carried in the second entity query request includes a second subject identifier (such as a user ID) and a plurality of second object identifiers (such as a plurality of stock codes and a plurality of stock codes of interest). This user ID may be used to correlate the user's stock taken and information concerning the stock in the two target databases. Through these association information, relevant target entity relationship data can be retrieved from the two target databases. Finally, the system integrates and returns these entity relationship data to the user, providing a clear, easily understood query result. The query results may include a stock list currently being held by the user, a stock list that has been of interest, and related statistical analysis charts, etc. The query result can help the user to better know the investment condition of the user and assist in decision making.

In some embodiments, the method further comprises:

when entity attribute data in the entity relation data is changed, marking the changed entity relation data with a new version;

And newly adding a new entity relation table corresponding to the version number of the new version, wherein the new entity relation table is used for storing the changed entity relation data.

For example, the data center station captures all data changes in real time through its powerful update log (ChangeLog) function, and these changes mainly come from operations of various business systems, such as adding, modifying or deleting. These change data are synchronized into several bins of the data. Based on these change data, the data center station then restarts the entity computing task. The entity calculation task is to process the changed data according to preset rules and algorithms to obtain changed entity relation data.

To manage these entity relationship data, the data center station adopts a version control method. When entity relationship data is changed, for example, entity attribute data in the entity relationship data is changed, the system identifies the changed entity relationship data with a new version. This helps to distinguish between versions of data at different points in time, facilitating subsequent data analysis and querying.

Each time there is new version data, the data center will add a new entity relation table with version number corresponding to the version number of the new version. In this way, each version of entity relationship data can be stored and managed separately. Each entity has its own version number, which allows flexible version switching without affecting the business party when the caliber calculated by the entity changes. For example, if the initial version of the stock entity that the user holds is v1, the corresponding entity relationship table is user_security_position_v1. When the caliber of the holding bin is changed, an entity relation table user_security_position_v2 with a version of v2 can be newly added. When the calculation of the caliber-changed data is completed and written into the new entity relation table user_security_position_v2, the version mapping relation of the whole entity is set to v2. In this way, the business party can continue to use the v2 version of the entity data without having to perceive the caliber change process.

Wherein, the caliber is used for defining and describing the specific content of the entity relation data in detail. For example, the entity relationship data is a market where a certain customer's financial account is located, and the specific content of the entity relationship data includes a stock market list (such as port stock, equity stock, omnipotent account, etc.) where all financial accounts that a certain customer opens under a certain dealer are located, and entity attribute data such as account ID, account fund type, account risk status, etc.

The embodiment of the application adopts different versions for different calibers, and can conveniently carry out seamless migration on data without influencing downstream.

And responding to the entity query request, acquiring changed target entity relationship data corresponding to the entity query request from a new entity relationship table of the target database, wherein the changed target entity relationship data comprises changed entity attribute data.

For example, when an entity query request is received, the system may first determine whether a new entity-relationship table corresponding to the request exists in the target database. The new entity relationship table is used for storing entity relationship data which is changed recently. If a new entity relationship table exists, the system retrieves data corresponding to the entity query request, i.e., modified target entity relationship data, from the table, the modified target entity relationship data including modified entity attribute data. These data reflect any updates or changes that have occurred since the last query. In this way, the system is able to provide the most up-to-date, accurate data response without requiring the user to manually refresh or retrieve the data. The method greatly improves the efficiency and response speed of data query and provides better data analysis and decision support for users.

Furthermore, the latest version of entity-relationship data can be used seamlessly for business parties without concern for the change and update process of the data. This simplifies the complexity of data processing and data usage, and improves the efficiency and accuracy of business processing.

According to the embodiment of the application, the changed target entity relationship data is obtained from the new entity relationship table of the target database by responding to the entity query request, so that efficient, accurate and real-time data service can be realized, and various business requirements and data analysis scenes can be met.

The embodiment of the application can realize a read-write separation frame, so that the writing and reading operation of data can be independently carried out, thereby improving the efficiency and reliability of data processing. In the read-write separation framework, write operations and read operations are handled by different servers or database instances, respectively, so that delays and disturbances of write operations to read operations can be avoided. For example, data may be written using a streaming framework (flink) that can handle large amounts of data streams in real time and write data to a target database, and efficient data writing and real-time data processing can be achieved by the streaming framework. And then the data can be read by using a remote procedure call protocol (Remote Procedure Call Protocol, RPC) interface mode, the RPC protocol allows different processes or computers to communicate, and calls the methods or functions of each other, and the remote reading and calling of the data can be realized through the RPC interface, thereby avoiding direct access to a database and reducing network congestion and delay.

In order to better understand the data processing method according to the embodiment of the present application, the embodiment of the present application is illustrated in the following with reference to fig. 3 to 4.

For example, FIG. 3 shows a data flow diagram of entity relationship data. For example, an entity computing task may be used to generate, from raw data acquired from a business data source, relationship data usable by a computing script through a specific entity relationship modeling model, which may be entity relationship data. And then storing the entity relation data output by the entity calculation task in an entity relation table, and storing the entity relation data in a database. The downstream business may then write a specific structured query language (Structured Query Language, SQL) according to its own requirements and pull the data required by its own business from the target database based on the entity interface service. The entity interface service can query entity relation data from the target database and provide the entity relation data to the downstream service, but actually writes SQL (structured query language) to the downstream service party.

For example, FIG. 4 shows a schematic diagram of an application scenario in which an entity query is conducted. Firstly, a business direction gateway sends an entity inquiry request; then, the gateway calls an SQL analysis module of the entity interface service based on the preprocessing module; then, the entity interface service filters the entity query request through the SQL analysis module and sends the filtered entity query request to the target database; then, the target database queries target entity relation data corresponding to the entity query request based on the SQL execution module to obtain a query result containing the target entity relation data, and returns the query result to the service party.

The embodiment of the application optimizes common defects of the single-host multi-object data system. The embodiment of the application designs a set of brand-new entity relation data development module, realizes second-level capture relation change and reports the second-level capture relation change to a database through a general architecture, thereby greatly improving timeliness of data change capture; the novel data storage structure is supported, the attribute of the personalized newly-added object can be easily supported, and the relation between a section of main object and a section of object can be described in more dimensions; the data reading application layer provided by the embodiment of the application can support more types of OLTP and OLAP interfaces, support object values of the corresponding relation of the second-level query subject, second-level metadata query and minute-level relation aggregation count query; and anchoring of multiple types of user group data through entity relationship data is supported, so that diversified delivery requirements, such as real-time user group labels, static user groups and the like, are realized.

According to the embodiment of the application, by providing the entity relation data, a user can simply and easily anchor the user group according to the relation generated by the host and the object in the application program. Meanwhile, the brand new data architecture solves the defect that the data structure of the user characteristic large-width table may exist in relation description.

All the above technical solutions may be combined to form an optional embodiment of the present application, and will not be described in detail herein.

In order to facilitate better implementation of the data processing method according to the embodiment of the present application, the embodiment of the present application further provides a data processing device. Referring to fig. 5, fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the application. Wherein the data processing apparatus 200 may include:

an acquisition unit 210 for acquiring the original data from the service data source;

An extracting unit 220, configured to extract entity relationship data from the original data, where each entity relationship data includes corresponding entity attribute data, a subject identifier, and an object identifier;

A first storage unit 230, configured to generate, for any piece of entity relationship data, an entity relationship table based on the entity relationship data, with a subject identifier and an object identifier in the entity relationship data as a joint primary key;

A second storage unit 240, configured to store the entity relationship table in a target database;

And the query unit 250 is used for responding to the entity query request and acquiring target entity relationship data corresponding to the entity query request from the target database.

the second storage unit 240 is configured to store different entity relationship tables in the first target database and the second target database at the same time, where the first target database is a relational database, and the second target database is a cloud primary data warehouse.

In some embodiments, the query unit 250 is configured to:

In some embodiments, the query unit 250 may be configured to:

And responding to the association information carried in the second entity query request, and acquiring target entity relationship data corresponding to the entity query request from the first target database and the second target database, wherein the association information comprises at least a second subject identifier and at least a second object identifier.

In some embodiments, the extraction unit 220 may also be configured to: when entity attribute data in the entity relation data is changed, marking the changed entity relation data with a new version;

The first storage unit 230 may be further configured to: and newly adding a new entity relation table corresponding to the version number of the new version, wherein the new entity relation table is used for storing the changed entity relation data.

In some embodiments, the query unit 250 may be further configured to:

In some embodiments, the extracting unit 220 is configured to:

In some embodiments, the extracting unit 220, when extracting a plurality of entity relationship data from the raw data based on a stream computing manner, includes:

It will be appreciated that data processing apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may be made with reference to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the data processing apparatus may execute the above-mentioned data processing method embodiment, and the foregoing and other operations and/or functions of each unit in the data processing apparatus implement respective flows of the above-mentioned method embodiment, which are not described herein for brevity.

Optionally, the present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the above method embodiments when executing the computer program.

Fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device may be a terminal or a server. As shown in fig. 6, the computer device 300 may include: a communication interface 301, a memory 302, a processor 303 and a communication bus 304. Communication interface 301, memory 302, and processor 303 enable communication with each other via communication bus 304. The communication interface 301 is used for data communication between the computer device 300 and an external device. The memory 302 may be used to store software programs and modules, and the processor 303 may execute the software programs and modules stored in the memory 302, such as the software programs for corresponding operations in the foregoing method embodiments.

Alternatively, the processor 303 may call a software program and module stored in the memory 302 to perform the following operations:

Acquiring original data from a service data source; extracting entity relation data from the original data, wherein each entity relation data comprises corresponding entity attribute data, a subject identifier and an object identifier; for any piece of entity relation data, taking a subject identifier and an object identifier in the entity relation data as a joint main key, and generating an entity relation table based on the entity relation data; storing the entity relation table in a target database; and responding to the entity query request, and acquiring target entity relationship data corresponding to the entity query request from the target database.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present application provides a computer readable storage medium having stored therein a plurality of computer programs that can be loaded by a processor to perform the steps of any of the data processing methods provided by the embodiments of the present application. The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

The steps of any data processing method provided by the embodiment of the present application can be executed by the computer program stored in the storage medium, so that the beneficial effects of any data processing method provided by the embodiment of the present application can be achieved, and detailed descriptions of the foregoing embodiments are omitted.

Embodiments of the present application also provide a computer program product comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the corresponding flow in any data processing method in the embodiment of the present application, which is not described herein for brevity.

The embodiments of the present application also provide a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the corresponding flow in any data processing method in the embodiment of the present application, which is not described herein for brevity.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of data processing, the method comprising:

acquiring original data from a service data source;

Extracting entity relation data from the original data, wherein each entity relation data comprises entity attribute data, a subject identifier and an object identifier;

storing the entity relation table in a target database;

2. The data processing method of claim 1, wherein the target database comprises a first target database and a second target database;

the storing the entity relationship table in a target database includes:

3. The data processing method according to claim 2, wherein the obtaining, in response to an entity query request, target entity relationship data corresponding to the entity query request from the target database includes:

4. The data processing method according to claim 3, wherein the obtaining, in response to the first entity query request, target entity relationship data corresponding to the entity query request from the first target database or the second target database includes:

5. The data processing method according to claim 3, wherein the obtaining, in response to the second entity query request, target entity relationship data corresponding to the entity query request from the first target database and the second target database includes:

6. The data processing method of claim 1, wherein the method further comprises:

7. The data processing method as claimed in claim 6, wherein said obtaining, in response to an entity query request, target entity relationship data corresponding to the entity query request from the target database includes:

8. The data processing method of claim 1, wherein the extracting entity relationship data from the raw data comprises:

9. The data processing method according to claim 8, wherein extracting a plurality of entity relationship data from the raw data based on the stream computation method comprises:

10. A data processing apparatus, the apparatus comprising:

11. A computer device, characterized in that it comprises a processor and a memory, in which a computer program is stored, the processor being arranged to execute the data processing method according to any of claims 1-9 by invoking the computer program stored in the memory.

12. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which is adapted to be loaded by a processor for performing a data processing method according to any of claims 1-9.