CN111930958B - Graph database construction method, computing device and readable storage medium - Google Patents
Graph database construction method, computing device and readable storage medium Download PDFInfo
- Publication number
- CN111930958B CN111930958B CN202010669156.4A CN202010669156A CN111930958B CN 111930958 B CN111930958 B CN 111930958B CN 202010669156 A CN202010669156 A CN 202010669156A CN 111930958 B CN111930958 B CN 111930958B
- Authority
- CN
- China
- Prior art keywords
- entity
- data
- graph database
- information
- entity information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010276 construction Methods 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 claims abstract description 64
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 230000008569 process Effects 0.000 claims description 23
- 230000008676 import Effects 0.000 claims description 20
- 230000008859 change Effects 0.000 claims description 19
- 238000012545 processing Methods 0.000 claims description 10
- 238000011084 recovery Methods 0.000 claims description 7
- 238000004891 communication Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 13
- 230000002093 peripheral effect Effects 0.000 description 6
- 238000013500 data storage Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000007723 transport mechanism Effects 0.000 description 2
- 238000013499 data model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a graph database construction method, which is suitable for being executed in a computing device, wherein the method comprises the following steps: determining an entity and an entity attribute related to the knowledge domain according to the knowledge domain of the constructed graph database; initializing an entity information table according to the entity and the entity attribute, wherein the entity information table comprises a node table, a relation table and an operation record table, the node table is suitable for storing node information, the relation table is suitable for storing the relation between nodes, and the operation record table is suitable for storing operation records of the node table and the relation table; acquiring source data from a data source, preprocessing the source data, and storing the source data into an entity information table; creating a structure of a graph database according to the entity and the entity attribute; and acquiring entity information from the entity information table, and importing the entity information into the graph database to complete the construction of the graph database. The invention also discloses a corresponding graph database construction device, a computing device and a readable storage medium.
Description
Technical Field
The present invention relates to the field of data processing, and in particular, to a graph database construction method, a computing device, and a readable storage medium.
Background
Knowledge maps, also known as scientific knowledge maps, describe relationships between entities and entity attributes in the field of expertise, which is essentially a graph database. The current design flow for graph databases is generally: the method comprises the steps of abstracting out entities, entity attributes and relationships among the entities, establishing vertexes and edges in a graph database structure according to the abstracted entities and entity attributes, and importing data according to the established graph database structure. However, in the data import process, once the import fails, timely recovery and data compensation of the data cannot be completed. Meanwhile, as all data only exist in the graph database, the updating query operation of all data is completed directly through the graph database, and the efficiency is affected.
The method is characterized in that the graph database is used as a center, the periphery of the graph database is used for data storage through a relational database and a document database, the relational database stores attributes and the hierarchical relationship of the attributes, the document database stores text data, the relational database and the document database are associated with the graph database through unique identification of an entity, and finally knowledge fusion is carried out on the two databases to construct a knowledge graph. In the method, although the data sources of the graph database are stored at the periphery, knowledge fusion is still needed to be conducted to import the graph database, when data deletion is conducted or data compensation is conducted, the peripheral data storage cannot be directly utilized, and when data is changed or inquired, the peripheral data storage cannot be relied on to reduce the pressure of inquiring and updating the graph database.
Disclosure of Invention
To this end, the present invention provides a graph database construction method, computing device, and readable storage medium in an effort to solve or at least alleviate the above-identified problems.
According to one aspect of the present invention, there is provided a graph database construction method adapted to be executed in a computing device, wherein the method comprises: determining an entity and an entity attribute related to the knowledge domain according to the knowledge domain of the constructed graph database; initializing an entity information table according to the entity and the entity attribute, wherein the entity information table comprises a node table, a relation table and an operation record table, the node table is suitable for storing node information, the relation table is suitable for storing the relation between nodes, and the operation record table is suitable for storing operation records of the node table and the relation table; acquiring source data from a data source, preprocessing the source data, and storing the source data into an entity information table; creating a structure of a graph database according to the entity and the entity attribute; and acquiring entity information from the entity information table, and importing the entity information into the graph database to complete the construction of the graph database.
Optionally, in the graph database construction method according to the present invention, the data source includes an information file, a web page pull, and a message push, and storing the source data into the entity information table after preprocessing includes: and acquiring entity attributes of the same entity from different data sources, integrating the entity attributes, and storing the entity attributes into an entity information table.
Optionally, in the graph database construction method according to the present invention, initializing the entity information table according to the entity and the entity attribute includes: a table structure of a node table is created, and the node table comprises fields including entity types, entity attributes and entity unique identifiers.
Optionally, in the graph database construction method according to the present invention, initializing the entity information table according to the entity and the entity attribute further includes: a table structure of a relation table is created, and fields contained in the relation table comprise a relation type, a relation start entity index number, a relation end entity index number and a relation attribute, wherein the entity index number is the index number of the entity in the node table.
Optionally, in the graph database construction method according to the present invention, initializing the entity information table according to the entity and the entity attribute further includes: a table structure of an operation record table is created, and the operation record table comprises fields including a table type, an update record index number, an update type and data information before update.
Optionally, in the graph database construction method according to the present invention, the table types include a node table and a relationship table; the update types include insert, update, and delete.
Optionally, in the graph database construction method according to the present invention, acquiring entity information from the entity information table, and importing the entity information into the graph database includes: acquiring entity information from an entity information table, wherein the entity information comprises an entity and entity attribute information; simplifying the acquired entity information, deleting unnecessary attribute information of the graph database, and obtaining simplified entity information; normalizing the simplified entity information to obtain normalized entity information; and storing the canonical entity information into a graph database, and simultaneously generating a graph database import log.
Optionally, in the graph database construction method according to the present invention, the method further includes data update of the graph database, including: when the data change information is received, comparing the change data with the data in the entity information table to obtain the data information needing to be changed; storing the data information to be changed into an operation record table; updating the change data into the entity information table and synchronizing the change data into the graph database.
Optionally, in the graph database construction method according to the present invention, the method further includes a data query of the graph database, and the data query in the graph database is completed by querying the entity information table.
Optionally, in the graph database construction method according to the present invention, the method further includes data compensation of the graph database, including: according to the graph database import log, obtaining data of import errors in the import process; and re-acquiring the data with the errors imported in the importing process from the entity information table, and re-importing the map database.
Optionally, in the graph database construction method according to the present invention, the method further includes recovering error data when the graph database data is updated in error, including: acquiring original data before updating error data from an operation record table; and recovering the data in the entity information table and the graph database data according to the original data.
According to still another aspect of the present invention, there is provided a graph database construction apparatus including: the entity generation module is suitable for determining an entity and an entity attribute related to the knowledge domain according to the knowledge domain of the constructed graph database; the entity information table construction module is suitable for initializing an entity information table according to the entity and the entity attribute, and comprises a node table, a relation table and an operation record table, wherein the node table is suitable for storing node information, the relation table is suitable for storing the relation between nodes, and the operation record table is suitable for storing operation records of the node table and the relation table; the source data processing module is suitable for acquiring source data from a data source, preprocessing the source data and storing the source data into the entity information table; the diagram database construction module is suitable for creating the structure of the diagram database according to the entity and the entity attribute, acquiring entity information from the entity information table, and importing the entity information into the diagram database to finish the construction of the diagram database.
According to yet another aspect of the present invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be adapted to be executed by the at least one processor, the program instructions comprising instructions for performing the graph database construction method as above.
According to still another aspect of the present invention, there is provided a readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform the graph database construction method as above.
According to the construction scheme of the graph database, all entity information is stored in the entity information table after being subjected to standard integration, the node information is stored in the node table, the relation table stores the relation information among the nodes, and the graph database is constructed by importing the data into the graph database. Because the graph database data is derived from the entity information table, when the data of the graph database is queried, the query of the entity information table can be completed, and when the data is changed, the change data can be compared with the entity information table instead of being directly compared with the graph database, so that the direct operation of the graph database is reduced, the occupation of graph database resources is reduced, the efficiency of data query, change and other operations is improved, on the other hand, if errors occur in the importing process of the graph database, the data compensation can be directly performed through the node table and the relation table, and when the data operation of the graph database is wrong, the data recovery can be performed through the data operation record in the operation record table.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which set forth the various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to fall within the scope of the claimed subject matter. The above, as well as additional objects, features, and advantages of the present disclosure will become more apparent from the following detailed description when read in conjunction with the accompanying drawings. Like reference numerals generally refer to like parts or elements throughout the present disclosure.
FIG. 1 illustrates a block diagram of a computing device 100 according to one embodiment of the invention;
FIG. 2 illustrates a flow diagram of a graph database construction method 200 according to one embodiment of the invention;
FIG. 3 illustrates an example of a graph database constructed in accordance with a graph database construction method of one embodiment of the present invention;
FIG. 4 shows a flowchart of a graph database data update process 400, according to one embodiment of the invention;
FIG. 5 illustrates a flow diagram of a graph database data compensation process 500, according to one embodiment of the invention;
FIG. 6 illustrates a flowchart of a graph database data recovery process 600 according to one embodiment of the invention;
fig. 7 shows a block diagram of a graph database construction apparatus 700 according to an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
FIG. 1 shows a schematic diagram of a computing device 100 according to one embodiment of the invention, in accordance with the graph database construction method of the present invention, adapted to be executed in the computing device. In a basic configuration 102, computing device 100 typically includes a system memory 106 and one or more processors 104. The memory bus 108 may be used for communication between the processor 104 and the system memory 106.
Depending on the desired configuration, the processor 104 may be any type of processing including, but not limited to: a microprocessor (μp), a microcontroller (μc), a digital information processor (DSP), or any combination thereof. The processor 104 may include one or more levels of caches, such as a first level cache 110 and a second level cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations, the memory controller 118 may be an internal part of the processor 104.
Depending on the desired configuration, system memory 106 may be any type of memory including, but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. The system memory 106 may include an operating system 120, one or more applications 122, and program data 124. The application 122 is actually a plurality of program instructions for instructing the processor 104 to perform a corresponding operation. In some implementations, the application 122 may be arranged to cause the processor 104 to operate with the program data 124 on an operating system.
Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to basic configuration 102 via bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices such as a display or speakers via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communication with one or more other computing devices 162 via one or more communication ports 164 over a network communication link.
The network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media in a modulated data signal, such as a carrier wave or other transport mechanism. A "modulated data signal" may be a signal that has one or more of its data set or changed in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or special purpose network, and wireless media such as acoustic, radio Frequency (RF), microwave, infrared (IR) or other wireless media. The term computer readable media as used herein may include both storage media and communication media.
Computing device 100 may be implemented as a server, such as a database server, an application server, a WEB server, etc., or as a personal computer including desktop and notebook computer configurations. Of course, computing device 100 may also be implemented as part of a small-sized portable (or mobile) electronic device.
In an embodiment according to the invention, the computing device 100 may be implemented as a graph database construction apparatus 700 and configured to perform the graph database construction method 200 according to an embodiment of the invention. Wherein the application 122 of the computing device 100 contains instructions for executing the graph database construction method 200 according to an embodiment of the present invention, the instructions may instruct the processor 104 to execute the graph database construction method of the present invention.
FIG. 2 illustrates a flow diagram of a graph database construction method 200 according to one embodiment of the invention. The method 200 begins with step S210 by first determining a knowledge domain to which the constructed graph database belongs, and determining entities and entity attribute information within the domain according to the knowledge domain.
Taking the automotive field as an example, extracting the entity in the automotive field includes: the entity types of BMW, BMW 3 are brands, and the entity type of BMW 3 is a train. When determining the entity, the entity attribute is also acquired, and the BMW 3 is taken as an example, and includes attribute information such as a manufacturer, a highest guiding price, a lowest guiding price and the like.
After determining the entity information in the field, step S220 is performed to initialize the entity information table according to the extracted entity and attribute information, where the step is mainly to initialize the entity information table structure, and the entity information table may be designed as a broad table. The entity information table comprises a node table, a relation table and an operation record table, and fields contained in each table are defined. The present invention is not limited to what type of data table is specifically implemented in each table in the entity information table, and may be any type of MyISAM, innoDB, HEAP, BOB, ARCHIVE, CSV or the like or other table structures that may be suitable for use in the present invention.
The node table is used for storing node information, and the fields of the node table comprise: entity type (label), entity unique identification (entity_id), and entity attribute (property). For the automobile field, the entity type can be information such as brands, automobile systems and the like, and the entity unique identifier belongs to a user-defined identifier for uniquely identifying a corresponding entity and can be a digital code. The entity attribute corresponds to the attribute of the entity in the record, and if the entity is a train, the attribute information contained in the record includes whether the information is a pool car, price information and the like.
The relation table is used for storing relation data among nodes, and the relation table comprises the following fields: relationship type (Label), relationship Start entity index number (Start_node_id), relationship end entity index number
(end_node_id) and a relationship attribute (relationship). The relationship type refers to a node-to-node relationship, such as "BMW 3" to "BMW", where BMW 3 is the relationship start entity, BMW is the relationship end entity, their index numbers in the node table are stored in start_node_id and end_node_id, respectively, and the relationship attribute stores the relationship between the two nodes, such as BMW 3 is the most popular vehicle among BMW.
The operation record table is used for recording operation records of the node table and the relation table, and comprises fields including a table type (type), an update record index number (update_id), an update type (update_type), and data information before update (property_before), wherein the type identifies whether the operation is to the node table or the relation table, 1 represents the operation to the node table, 2 represents the operation to the relation table, the update record index number identifies the position of the updated record in the table, the update type comprises 3, 1 is an inserting operation, 2 is an updating operation, 3 is a deleting operation, and property_before is used for storing the data information before update.
In addition, in the node table, the relationship table, and the operation record table, each record corresponds to one index number (index).
After the entity information table structure is created, the process proceeds to step S230, where source data is obtained from the data source, and the information is integrated and stored in the entity information table created in step S220.
The source data can be derived from a general unchanged information file, or can be obtained through web page pulling or external message pushing. Because these information are all original information, and are not integrated, different source data may contain different attribute information of the same entity. For example, the minimum guiding price of the BMW 3 system is 29 ten thousand, the maximum guiding price is 40 ten thousand, the BMW 3 system is obtained from the information file and belongs to the pooling car, and the BMW 3 system is in a selling state, and the two information are integrated together and stored in the node table as shown in table 1. index 1 indicates that this is the first record in the table, label "train" indicates that the entity type of the record in this record is train, entity_id 66 indicates that the entity in this record is uniquely identified by 66, and entity name and entity attribute are stored in property.
TABLE 1
According to one embodiment of the present invention, in this step, information of selling 10 BMW 3 series vehicles in 4S store A6 months is obtained by web page pulling, wherein the index number of the BMW 3 series is 1,4S store A and the obtained information is stored in a relation table as shown in Table 2.
TABLE 2
According to one embodiment of the present invention, the data change information "BMW 3 is the highest guiding price change to 60 ten thousand" obtained from message pushing, and the data in the node table 1 needs to be changed according to the change message, and the change record is stored in the operation record table, as shown by the record with index 1 in table 3, wherein the 1 identifier of the type field is an operation on the node table, the 1 identifier of the update_index field indicates that the record with index 1 in the node table is updated, and the 2 identifier of the update_type field indicates that the operation type is updated.
TABLE 3 Table 3
The 2 nd record in Table 3 shows that a piece of relation data from "4S store A" to "BMW 3 series" is inserted into the relation table; the 3 rd record indicates that the data record with index of 100 is deleted in the node table.
In the above change operation, the data in the node table is updated to the data as in table 4.
TABLE 4 Table 4
The above tables are merely exemplary, and are not representative of the actual data formats in the data tables, and the present invention is not limited to the specific data storage formats in the data tables.
After all the source data are processed, the process proceeds to step S240, where a graph database structure, also called a knowledge graph schema, is created according to the defined entities and attributes, which is a data model. The process is mainly a knowledge extraction process, and takes the automotive field as an example, the system, the place of production, the brand, the engine and the like can be extracted.
After the entity information table and the graph data structure are created, the data in the entity information table will be imported into the graph database through step S250. All data information is first acquired from the entity information table, but since all entity attribute information is stored in the entity information table, unnecessary attribute information of the map database needs to be deleted. Taking the data in table 1 as an example, BMW 3 contains a plurality of attribute information, however, if price information is not needed in the graph database, the price information is deleted, and if the data expression form sold is not normalized, the data expression form is marked in the graph database, the data expression form cannot be marked in the graph database, normalization processing is needed, for example, 0 is marked in the graph database, 1 is marked in the graph database, and finally the processed data is imported into the graph database, and according to one embodiment of the invention, the graph database can be a janus graph.
Fig. 3 shows an example of a map database constructed according to a map database construction method according to an embodiment of the present invention.
FIG. 4 shows a flowchart of a graph database data update process 400, according to one embodiment of the invention. The process starts at step S410, when data update information is received, the changed data is compared with the entity information table, and changed data information is obtained.
Continuing with the data in Table 1 as an example, BMW 3 is the current data { index:1, label: "Car:" identity_id: 66, property: { "services_id": "service_name": "BMW 3", "services_place": "payroll", "services_max_price": "service_min_price": "290000", "is_sample": "is_not }, the received change data is { index:1, label:" car: "auto_id: 66, property: {" services_id ":" 66, "services_name": "BMW 3", "services_max_price": "600000 }, and by comparing the data of each field of the property, the change of services_max_price is found.
Then, step S420 is performed, and the records { index:1, type:1, update_id:1, update_type:1, property_before } "service_id": 66, "service_name": "BMW 3 series", "service_place": "joint good", "service_max_price": 400000, "service_min_price": 290000, "is_save": "}, are added to the operation record table.
The data in the node table is then changed according to step S430, and "services_max_price" is updated to 600000.
Finally, the changed data is updated to the graph database through step S440.
The method for constructing the graph database further comprises a data query process of the graph database, wherein all contents in the graph database are stored in the entity information table, so that the query can be directly performed through the entity information table without performing the query through the graph database.
When data is imported from the entity information table to the graph database, a graph database import log is generated, so that when the graph database import process is in error, data compensation can be completed according to the log file. FIG. 5 shows a flow diagram of a graph database data compensation process 500 according to one embodiment of the invention.
The process starts at step S510 with reading a map database import log.
The index of the record of the import error, i.e., index of the record, is then obtained from the log file in step S520, and according to one embodiment of the present invention, the record of index >1000 in the node table and the record of index >1500 in the relationship table are data records of the import failure.
These data import map databases are retrieved from the entity information table, via step S530. In the above example, the record of index >1000 in the node table and the record of index >1500 in the relationship table should be acquired to reintroduce the graph database.
FIG. 6 illustrates a flow diagram of a graph database data recovery process 600 according to one embodiment of the invention. This process typically occurs when the graph database is contaminated by data errors returned when pulling the source data. Since the change processes of the node table and the relationship table are recorded in the operation record table, data recovery can be performed according to the data in the operation record table.
The process begins at step S610, where the data before the data is contaminated is obtained from the operation record table, and the data in table 3 and table 4 are taken as an example, and the data in table 4 is stored in the graph database, and due to this data error, the original data can be obtained from the data in the operation record table (the first data record in table 3), and the original data obtained from the data in table 3 should be { "services_id": 66, "services_name": "BMW 3 series", "services_place": "sum", "services_max_price": 400000 "," services_min_price ": 290000" }.
After the information is obtained, step S620 is performed to update the data in the node table according to property_before, update the services_max_price to 400000, and record the change record in the operation record table.
Step S630 is then entered to restore the data in the graph data.
Fig. 7 shows a block diagram of a graph database construction apparatus 700 according to an embodiment of the invention. As shown in fig. 7, the graph database construction apparatus 700 includes an entity generation module 710, an entity information table construction module 720, a source data processing module 730, and a graph database construction module 740.
The entity generation module 710 determines entities and entity attributes based on the business requirements, and the graph database structure and the entity information table structure are constructed based on the determined entities and entity attributes.
The entity information table construction module 720 initializes the entity information table including a node table, a relationship table, and an operation record table according to the entity and the entity attribute.
The node table stores node information, and the included fields comprise entity types, entity attributes and entity unique identifiers; the relation table stores the relation among the nodes, and the included fields comprise a relation type, a relation start entity unique identifier, a relation end entity unique identifier and a relation attribute; the operation record table stores operation records of the node table and the relation table, and the fields comprise a table type, an update record index number, an update type and data information before update, wherein the table type field identifies whether the operation is performed on the node table or the relation table, and the update type comprises three types of insertion, update and deletion.
In addition, the operations of updating, reading, compensating and recovering the error data on the peripheral data are completed in the entity information table construction module.
The source data processing module 730 obtains source data from data sources such as information files, web page pulling, message pushing, etc., integrates entity attributes of the same entity obtained from different data sources, and stores the entity attributes in an entity information table.
The graph database construction module 740 is configured to create a structure of the graph database according to the entity and the entity attribute, and obtain the entity information from the entity information table to import the graph database.
Firstly, obtaining entity information from an entity information table, wherein the entity information comprises redundant information which is not needed by a plurality of graph databases, deleting the redundant information, normalizing the nonstandard data representation in the rest information center, finally importing the processed information into the graph databases, and simultaneously generating a graph database import log, wherein the log file is used for data compensation when import fails.
According to the construction scheme of the graph database, the entity information table is created, the node information and the relation information between the nodes are stored in the node table and the relation table, and the node information is consistent with the information in the graph database, so that the node information in the query graph database is completed by querying the entity information table, and when data is changed, the node information is compared with the entity information table instead of being directly compared with the graph database, the occupation of resources of the graph database is reduced, the query efficiency is improved, and on the other hand, if errors occur in the importing process of the graph database, the node information and the relation table can be directly used for data compensation, and when the data of the graph database is in operation error, the data can be recovered through the data operation record in the operation record table.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions of the methods and apparatus of the present invention, may take the form of program code (i.e., instructions) embodied in tangible media, such as removable hard drives, U-drives, floppy diskettes, CD-ROMs, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to execute the graph database construction method of the present invention in accordance with instructions in said program code stored in the memory.
By way of example, and not limitation, readable media comprise readable storage media and communication media. The readable storage medium stores information such as computer readable instructions, data structures, program modules, or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.
In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with examples of the invention. The required structure for a construction of such a system is apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
It should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment, or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into a plurality of sub-modules.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
Furthermore, some of the embodiments are described herein as methods or combinations of method elements that may be implemented by a processor of a computer system or by other means of performing the functions. Thus, a processor with the necessary instructions for implementing the described method or method element forms a means for implementing the method or method element. Furthermore, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is for carrying out the functions performed by the elements for carrying out the objects of the invention.
As used herein, unless otherwise specified the use of the ordinal terms "first," "second," "third," etc., to describe a general object merely denote different instances of like objects, and are not intended to imply that the objects so described must have a given order, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.
Claims (11)
1. A graph database construction method adapted to be executed in a computing device, the method comprising:
determining an entity and an entity attribute related to the knowledge domain according to the knowledge domain of the constructed graph database;
initializing an entity information table according to the entity and the entity attribute, wherein the entity information table comprises a node table, a relation table and an operation record table, the node table is suitable for storing node information, the relation table is suitable for storing the relation between nodes, the operation record table is suitable for storing operation records of the node table and the relation table, and the method comprises the following steps: creating a table structure of the operation record table, wherein the operation record table comprises fields including a table type, an update record index number, an update type and data information before update;
acquiring source data from a data source, preprocessing the source data, and storing the source data into the entity information table;
creating a structure of a graph database according to the entity and the entity attribute;
obtaining entity information from the entity information table, importing the entity information into a graph database, completing the construction of the graph database,
the method further comprises data query of a graph database, wherein the data query in the graph database is completed by querying an entity information table;
the method also includes recovery of the error data when the graph database data is updated in error, including: acquiring original data before the error data is updated from the operation record table; and recovering the data in the entity information table and the graph database data according to the original data.
2. The method of claim 1, wherein the data sources include information files, web page pulls, and message pushes, and wherein preprocessing the source data and storing the source data in the entity information table includes:
and acquiring entity attributes of the same entity from different data sources, integrating the entity attributes, and storing the entity attributes into the entity information table.
3. The method of claim 1, wherein said initializing an entity information table according to said entity and entity attributes comprises:
and creating a table structure of the node table, wherein the fields contained in the node table comprise entity types, entity attributes and entity unique identifiers.
4. The method of claim 1, wherein initializing an entity information table according to the entity and entity attributes further comprises:
and creating a table structure of the relation table, wherein the relation table comprises fields including relation type, relation start entity index number, relation end entity index number and relation attribute, and the entity index number is the index number of the entity in the node table.
5. The method of claim 1, wherein the table types include a node table and a relationship table; the update types include insert, update, and delete.
6. The method of claim 1, wherein the obtaining entity information from an entity information table, importing the entity information into a graph database comprises:
acquiring entity information from an entity information table, wherein the entity information comprises an entity and entity attribute information;
simplifying the acquired entity information, deleting unnecessary attribute information of the graph database, and obtaining simplified entity information;
normalizing the simplified entity information to obtain normalized entity information;
and storing the canonical entity information into a graph database, and generating a graph database import log.
7. The method of claim 1, wherein the method further comprises a data update of a graph database, comprising:
when the data change information is received, comparing the change data with the data in the entity information table to obtain the data information needing to be changed;
storing the data information to be changed into the operation record table;
and updating the change data into an entity information table and synchronizing the change data into a graph database.
8. The method of claim 1, wherein the method further comprises data compensation of a graph database, comprising:
according to the graph database import log, obtaining data of import errors in the import process;
and re-acquiring the data with the errors imported in the importing process from the entity information table, and re-importing the data into the graph database.
9. A graph database construction apparatus comprising:
the entity generation module is suitable for determining an entity and an entity attribute related to the knowledge domain according to the knowledge domain of the constructed graph database;
the entity information table construction module is suitable for initializing an entity information table according to the entity and the entity attribute, and comprises a node table, a relation table and an operation record table, wherein the node table is suitable for storing node information, the relation table is suitable for storing the relation between nodes, the operation record table is suitable for storing operation records of the node table and the relation table, and a table structure of the operation record table is created, and fields contained in the operation record table comprise table types, update record index numbers, update types and data information before update;
the source data processing module is suitable for acquiring source data from a data source, preprocessing the source data and storing the source data into the entity information table;
the graph database construction module is suitable for creating the structure of the graph database according to the entity and the entity attribute, acquiring entity information from the entity information table, importing the entity information into the graph database to finish the construction of the graph database,
the method also comprises the data query of the graph database, wherein the data query in the graph database is completed by querying the entity information table, and comprises the recovery of error data when the data of the graph database is updated in error, and the method comprises the following steps: acquiring original data before the error data is updated from the operation record table; and recovering the data in the entity information table and the graph database data according to the original data.
10. A terminal device, comprising:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-8.
11. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a terminal device, cause the terminal device to perform any of the methods of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010669156.4A CN111930958B (en) | 2020-07-13 | 2020-07-13 | Graph database construction method, computing device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010669156.4A CN111930958B (en) | 2020-07-13 | 2020-07-13 | Graph database construction method, computing device and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111930958A CN111930958A (en) | 2020-11-13 |
CN111930958B true CN111930958B (en) | 2023-12-01 |
Family
ID=73312424
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010669156.4A Active CN111930958B (en) | 2020-07-13 | 2020-07-13 | Graph database construction method, computing device and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111930958B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113763097A (en) * | 2020-12-14 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Method and device for updating article information |
CN113220659B (en) * | 2021-04-08 | 2023-06-09 | 杭州费尔斯通科技有限公司 | Data migration method, system, electronic device and storage medium |
CN113326276B (en) * | 2021-06-23 | 2024-07-16 | 北京金山数字娱乐科技有限公司 | Graph database updating method and device |
CN113239063B (en) * | 2021-06-23 | 2024-03-29 | 北京金山数字娱乐科技有限公司 | Graph database updating method and device |
CN113901279B (en) * | 2021-12-03 | 2022-03-22 | 支付宝(杭州)信息技术有限公司 | Graph database retrieval method and device |
CN114996297B (en) * | 2022-04-14 | 2023-09-26 | 建信金融科技有限责任公司 | Data processing method, device, equipment and medium |
CN115361265B (en) * | 2022-08-16 | 2023-05-26 | 网络通信与安全紫金山实验室 | Network equipment management system and method |
CN117632970B (en) * | 2023-12-18 | 2024-06-14 | 智人开源(北京)科技有限公司 | Multimode fusion database and digital twin entity data storage method of database |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104408067A (en) * | 2014-10-29 | 2015-03-11 | 中国建设银行股份有限公司 | Multi-tree structure database design method and device |
CN104933101A (en) * | 2015-05-29 | 2015-09-23 | 南车株洲电力机车研究所有限公司 | SVN-based method for automatic statistics of configuration audit information |
CN106227800A (en) * | 2016-07-21 | 2016-12-14 | 中国科学院软件研究所 | The storage method of the big data of a kind of highlights correlations and management system |
CN109753537A (en) * | 2019-01-25 | 2019-05-14 | 中国人民大学 | A kind of interactive data moving method from relation data to diagram data |
CN109815340A (en) * | 2019-01-17 | 2019-05-28 | 云南师范大学 | A kind of construction method of national culture information resources knowledge mapping |
CN110413695A (en) * | 2019-07-29 | 2019-11-05 | 北京百度网讯科技有限公司 | Police affair information management method, apparatus, equipment and medium based on block chain |
CN110555015A (en) * | 2019-09-09 | 2019-12-10 | 腾讯科技(深圳)有限公司 | Database entity management method and device, electronic equipment and storage medium |
CN110750649A (en) * | 2018-07-06 | 2020-02-04 | 中兴通讯股份有限公司 | Knowledge graph construction and intelligent response method, device, equipment and storage medium |
CN111104525A (en) * | 2019-12-31 | 2020-05-05 | 西安理工大学 | Construction method of building design specification knowledge graph based on graph database |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10505756B2 (en) * | 2017-02-10 | 2019-12-10 | Johnson Controls Technology Company | Building management system with space graphs |
-
2020
- 2020-07-13 CN CN202010669156.4A patent/CN111930958B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104408067A (en) * | 2014-10-29 | 2015-03-11 | 中国建设银行股份有限公司 | Multi-tree structure database design method and device |
CN104933101A (en) * | 2015-05-29 | 2015-09-23 | 南车株洲电力机车研究所有限公司 | SVN-based method for automatic statistics of configuration audit information |
CN106227800A (en) * | 2016-07-21 | 2016-12-14 | 中国科学院软件研究所 | The storage method of the big data of a kind of highlights correlations and management system |
CN110750649A (en) * | 2018-07-06 | 2020-02-04 | 中兴通讯股份有限公司 | Knowledge graph construction and intelligent response method, device, equipment and storage medium |
CN109815340A (en) * | 2019-01-17 | 2019-05-28 | 云南师范大学 | A kind of construction method of national culture information resources knowledge mapping |
CN109753537A (en) * | 2019-01-25 | 2019-05-14 | 中国人民大学 | A kind of interactive data moving method from relation data to diagram data |
CN110413695A (en) * | 2019-07-29 | 2019-11-05 | 北京百度网讯科技有限公司 | Police affair information management method, apparatus, equipment and medium based on block chain |
CN110555015A (en) * | 2019-09-09 | 2019-12-10 | 腾讯科技(深圳)有限公司 | Database entity management method and device, electronic equipment and storage medium |
CN111104525A (en) * | 2019-12-31 | 2020-05-05 | 西安理工大学 | Construction method of building design specification knowledge graph based on graph database |
Non-Patent Citations (3)
Title |
---|
Mapping ERD to Knowledge Graph;A. Elfaki 等;2019 IEEE World Congress on Services (SERVICES);110-114 * |
主流知识图谱存储系统试验对比;葛唯益 等;指挥信息系统与技术;第10卷(第05期);28-33、75 * |
于知识图谱的医疗病历数据存储研究;夏宇航 等;计算机工程;第45卷(第01期);9-16、22 * |
Also Published As
Publication number | Publication date |
---|---|
CN111930958A (en) | 2020-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111930958B (en) | Graph database construction method, computing device and readable storage medium | |
CN112199366B (en) | Data table processing method, device and equipment | |
WO2020155740A1 (en) | Information query method and apparatus, and computer device and storage medium | |
CN108170752B (en) | Template-based metadata management method and system | |
CN112328548A (en) | File retrieval method and computing device | |
CN110134681B (en) | Data storage and query method and device, computer equipment and storage medium | |
US10997218B2 (en) | Method and system for managing associations between entity records | |
CN109992603B (en) | Data searching method and device, electronic equipment and computer readable medium | |
CN112818181A (en) | Graph database retrieval method, system, computer device and storage medium | |
CN113535642A (en) | File searching method and computing device | |
CN112328592A (en) | Data storage method, electronic device and computer readable storage medium | |
CN115576905A (en) | Archive file management method and device, electronic equipment and storage medium | |
CN113704182B (en) | Data checking method and computing device | |
CN110647577A (en) | Data cube partitioning method and device, computer equipment and storage medium | |
US8005844B2 (en) | On-line organization of data sets | |
WO2017072872A1 (en) | Business program generation assistance system and business program generation assistance method | |
US20140195561A1 (en) | Search method and information managing apparatus | |
WO2022262240A1 (en) | Data processing method, electronic device, and storage medium | |
CN114020771B (en) | Mail retrieval method, device, system, computing equipment and storage medium | |
CN115934879A (en) | Layout document construction method and system, and text search method and system | |
CN115686589A (en) | Entity class file updating system and method, corresponding computer equipment and storage medium | |
CN116910057B (en) | Quick intelligent data modeling method and system supporting multiple scenes | |
CN114968922A (en) | Index updating method, computing device and storage medium | |
CN111723162B (en) | Dictionary processing method, processing device, server and voice interaction system | |
KR20240029945A (en) | Method, computer device, and computer program for item ledger platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |