[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111931485B - A Multimodal Heterogeneous Associated Entity Recognition Method Based on Cross-Network Representation Learning - Google Patents

A Multimodal Heterogeneous Associated Entity Recognition Method Based on Cross-Network Representation Learning Download PDF

Info

Publication number
CN111931485B
CN111931485B CN202010806775.3A CN202010806775A CN111931485B CN 111931485 B CN111931485 B CN 111931485B CN 202010806775 A CN202010806775 A CN 202010806775A CN 111931485 B CN111931485 B CN 111931485B
Authority
CN
China
Prior art keywords
entities
entity
heterogeneous
multimodal
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010806775.3A
Other languages
Chinese (zh)
Other versions
CN111931485A (en
Inventor
周小平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Civil Engineering and Architecture
Original Assignee
Beijing University of Civil Engineering and Architecture
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Civil Engineering and Architecture filed Critical Beijing University of Civil Engineering and Architecture
Priority to CN202010806775.3A priority Critical patent/CN111931485B/en
Publication of CN111931485A publication Critical patent/CN111931485A/en
Application granted granted Critical
Publication of CN111931485B publication Critical patent/CN111931485B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供了一种基于跨网络表示学习的多模异质关联实体识别方法。该方法包括:给定两个多模异质信息网络:

Figure DDA0002629425630000011
Figure DDA0002629425630000012
EA和EB为实体集合,RA和RB为实体关系集合,TA和TB为实体类型集合,CA和CB为实体关系类型集合,设两个实体EAi∈EA和EBj∈EB,基于EAi和EBj之间的随机游走路径集合通过迭代的方法建立EAi和EBj之间的多模关系转移概率Mij,通过Mij利用目标函数学习得到EAi和EBj的多模异质特征向量;当判断EAi和EBj同时具有多模异质一致性、属性一致性和环境一致性,则确定EAi和EBj为关联实体。本发明充分分析多模异质信息网络多模异质特征,形成一套多模异质信息网络形式化描述方法和基于跨网络表示学习的多模异质关联实体识别模型和方法。

Figure 202010806775

The invention provides a multimodal heterogeneous associated entity recognition method based on cross-network representation learning. The method includes: Given two multimodal heterogeneous information networks:

Figure DDA0002629425630000011
and
Figure DDA0002629425630000012
E A and E B are entity sets, R A and R B are entity relation sets, T A and T B are entity type sets, and C A and C B are entity relation type sets. Let two entities E Ai ∈ E A and E Bj ∈E B , based on the set of random walk paths between E Ai and E Bj , the multimodal relationship transition probability M ij between E Ai and E Bj is established by an iterative method, and E is obtained by learning the objective function through M ij Multimodal heterogeneous feature vectors of Ai and E Bj ; when judging that E Ai and E Bj have multimodal heterogeneous consistency, attribute consistency and environmental consistency at the same time, then E Ai and E Bj are determined as associated entities. The invention fully analyzes the multi-modal heterogeneous characteristics of the multi-modal heterogeneous information network, and forms a set of formal description methods for the multi-modal heterogeneous information network and a multi-modal heterogeneous associated entity recognition model and method based on cross-network representation learning.

Figure 202010806775

Description

Multi-mode heterogeneous associated entity identification method based on cross-network representation learning
Technical Field
The invention relates to the technical field of identification of multimode heterogeneous information network associated entities, in particular to a multimode heterogeneous associated entity identification method based on cross-network representation learning.
Background
The multimode heterogeneous Information network (Building Information Model/modeling) is a digital expression of physical and functional characteristics of Building facilities, aims to provide reliable shared knowledge resources for decision and cooperation of different participants in the whole life cycle of a Building, and becomes important content for modernization of the Building industry, construction of smart cities and the like in China.
The multimodal heterogeneous information network associated entity identifies data entities that are intended to find out in different multimodal heterogeneous information networks that refer to the same object in the real world. The accurate and comprehensive identification of the multimode heterogeneous information network associated entities realizes the organic integration of dispersed and isolated multimode heterogeneous information networks, is the key for realizing the whole-process integrated application of the multimode heterogeneous information networks and the whole life cycle data sharing of engineering construction projects, solves the problems of 'information fault' and 'information isolated island' in the digitization of the current construction projects, and provides reliable and complete infrastructure big data support for the engineering construction projects and the whole-cycle management of smart cities. At the present stage, most multimode heterogeneous information network associated entity identification methods are based on manual labeling, geometric attribute matching or text attribute modeling; few studies consider the entity relationships of the multi-mode heterogeneous information networks, but ignore the multi-mode characteristics of the entity relationships of the multi-mode heterogeneous information networks.
The identification of the multimode heterogeneous information network associated entity is a cross-domain and cross-discipline task, is a key of the whole-process integrated application and the whole-life-cycle data fusion and sharing of the multimode heterogeneous information network, and is an important component of a domain-oriented building big data value and knowledge discovery theory and method. The implementation of the method enriches and perfects theories and methods such as associated entity recognition and network representation learning in the field of data mining, promotes the application and innovation of leading-edge theories and methods of computer science in the building science, develops a new thought of multimode heterogeneous information network basic research, develops a new research direction for the cross fields of computers, building and civil engineering and the like, and has important theoretical value. The research result of the method can promote the national important requirement of modernization transformation and upgrading of the building industry, serve the construction and the 'full-period management' of smart cities, smart infrastructures, smart people and the like, and have great economic and social benefits.
Currently, the importance of identification of multi-mode heterogeneous information network associated entities has attracted extensive attention of scholars at home and abroad. The cross-network representation learning of continuous low-dimensional vectors embedding different networks into the same space is a research hotspot in the field of machine learning in recent years. Many colleges and universities and scientific research institutions at home and abroad develop researches such as identification of multimode heterogeneous information network associated entities and cross-network representation learning, and achievements can be found in top-level periodicals and conferences of computers and cross-subject applications thereof. Thus, multi-modal heterogeneous information network association entity identification is the leading edge of current computer and building and civil interdisciplinary research.
The multimode heterogeneous information network associated entity refers to a multimode heterogeneous information network entity which refers to the same real-world object in different multimode heterogeneous information networks. In general, a multi-mode heterogeneous information network
Figure BDA0002629425610000021
Can be expressed as
Figure BDA0002629425610000022
Wherein, E and R are respectively a heterogeneous entity set and an inter-entity multi-mode relationship set, and T and C are respectively a type set of E and R. Given two multimode heterogeneous information networks
Figure BDA0002629425610000023
And
Figure BDA0002629425610000024
if EAi∈EAAnd EBj∈EBRefer to the same object in the real world, then called EAiAnd EBjFor the associated entity, note EAi=EBj(ii) a Otherwise EAi≠EBj. FIG. 1 is a schematic diagram of identification of a multi-mode heterogeneous information network associated entity through which identification is performed
Figure BDA0002629425610000025
And
Figure BDA0002629425610000026
data feature determination in (E)Ai∈EAAnd EBj∈EBWhether it is an associated entity, i.e.:
Figure BDA0002629425610000027
IFCs (Industry Foundation Classes) are currently recognized international standards for multimode heterogeneous information networks and are widely used in various enterprises in the construction Industry. At present, almost all multi-mode heterogeneous information network software supports the IFC format, and most multi-mode heterogeneous information network researches are based on the IFC standard, such as building construction and the like. Based on the IFC standard, the multi-mode heterogeneous information network shows multi-mode heterogeneous characteristics and massive entity characteristics.
Multi-mode heterogeneous characteristics
The heterogeneous characteristics mean that the types of the multimode heterogeneous information network entities are various, and the attributes of different types of entities are different. Currently, IFCs have defined 653 different entities, and the number of entities continues to expand with the actual demand and iteration of the IFC version. The attributes of the multi-mode heterogeneous information network entity can be divided into semi-structured text attributes for describing basic information of the entity and unstructured geometric attributes for describing a three-dimensional shape of the entity. In the IFC standard, only entities that inherit the IFCProduct class are likely to have geometric properties. The roof objects in FIG. 1 all inherit to an IFCProduct class, which contains both geometric and textual properties. The problems of non-uniform fields, missing values, redundancy, inaccuracy, inconsistency and the like exist in entity text attributes of different multimode heterogeneous information networks, so that the identification quality of the multimode heterogeneous information network associated entity identification method based on the text attributes is poor (the recall rate and the accuracy rate are low), and the requirement of the multimode heterogeneous information network on the whole-process integrated application cannot be met.
The multimode characteristic means that a plurality of relationships of potentially different modes exist between any two multimode heterogeneous information network entities. Currently, IFC has defined 5 major classes of 19 different types of relationships, including: reference, containment, decomposition, connection, inheritance, and the like. The multimode heterogeneous information network has different multimode relation description forms, and challenges are brought to the formal description and mathematical expression of the multimode heterogeneous information network. The multimode characteristic also means that multimode heterogeneous information network entities are interdependent in different forms, showing strong dependence. The introduction of the entity relationship is an effective way for solving the problem of poor identification quality of the identification method of the multimode heterogeneous information network associated entity based on the text attribute, however, the existing method ignores the multimode characteristic of the multimode heterogeneous information network relationship.
The multimode heterogeneous characteristics of the multimode heterogeneous information network are important manifestations of the complexity of the multimode heterogeneous information network. At present, the research is started from the attributes of multimode heterogeneous information network entities, and the multimode characteristics of the multimode heterogeneous information network are researched and explored less. If the multimode heterogeneous characteristics of the multimode heterogeneous information network can be deeply explored, a formal description method of the multimode heterogeneous information network is established from the perspective of a complex network, application innovation of theories and methods such as graph theory, network science, graph learning and big data in the multimode heterogeneous information network is promoted, a new idea of fundamental application research of the multimode heterogeneous information network is developed, and a model basis is established for identification, parallel computing and the like of multimode heterogeneous information network associated entities.
② mass entity characteristics
The IFC is a multi-mode heterogeneous information network description file with highly compressed information, and a million IFC file contains millions or even tens of millions of multi-mode heterogeneous information network entities. Generally, a multi-mode heterogeneous information network of an actual engineering project is composed of a plurality of IFC files of different specialties. According to statistics, the multimode heterogeneous information network of a three-layer building in the design stage can reach 50G. Thus, the multi-mode heterogeneous information network contains a vast number of multi-mode heterogeneous information network entities.
In the prior art, most of the research methods of the multimode heterogeneous information network only aim at the multimode heterogeneous information network with smaller volume. Some students pay attention to massive entities and big data characteristics thereof in the multimode heterogeneous information network, and develop researches on multimode heterogeneous information network big data distributed storage and management frameworks and the like for lightweight visualization of the multimode heterogeneous information network and field-oriented application. The parallel computing distributes computing tasks to a plurality of processing units for computing, and is an effective way for improving the processing capacity and efficiency of the multimode heterogeneous information network. A few researches initially explore a multimode heterogeneous information network parallel computing method, however, the method ignores the imbalance of the multimode heterogeneous information network entity attributes, is difficult to be applied to any multimode heterogeneous information network, and cannot meet the requirement of full-life-cycle multimode heterogeneous information network parallel processing. The strong dependence of the multi-mode heterogeneous information network makes it difficult for the existing parallel computing framework to be directly applied to the multi-mode heterogeneous information network. Due to disciplinary intersection, the research of the current multimode heterogeneous information network parallel computing method is less, and the method for identifying the associated entity is limited to rapidly process the large-volume multimode heterogeneous information network.
Identification research status of multi-mode heterogeneous information network associated entity
The identification of the multimode heterogeneous information network associated entity based on UUID (Universal Unique Identifier) is the simplest and most accurate method; however, different multimode heterogeneous information network tools maintain different UUIDs, and even UUIDs formed by different versions of the same multimode heterogeneous information network tool are different. At present, most of identification methods of the multimode heterogeneous information network associated entities are based on manual labeling, geometric attribute matching or text attribute modeling.
The identification of the manually marked multimode heterogeneous information network associated entity depends on the quality of the change relation model and the accuracy of the manual change marking, and the manual workload is heavy and is easy to make mistakes. Although the associated entity identification method based on geometric attribute matching can detect three-dimensional similarities and differences between two models; however, the method only identifies the model difference in geometric shape, is difficult to be applied to identification of the multi-mode heterogeneous information network associated entity with complex relationships such as reference and inheritance, and cannot identify the multi-mode heterogeneous information network entity without the geometric shape. In order to solve the problems existing in manual labeling, a part of researches propose an associated entity identification model based on text attributes of multimode heterogeneous information network entities; however, entities of the same type typically have similar text attributes. For example, in fig. 1, the text attributes of a plurality of window entities of the same type are mostly the same or similar. The similarity of the attribute characteristics of the same type of entities of the multimode heterogeneous information network limits the application range of the method. A few studies convert the reference relationship between the entities of the multimode heterogeneous information network into an RDF (Resource Description Framework) graph and a reference hierarchy, so as to improve the quality of identification of the multimode heterogeneous information network associated entities based on text attributes. The method also ignores the complex relation and geometric attribute characteristics of the multimode heterogeneous information network.
The comprehensive utilization of the attributes and the multimode heterogeneous characteristics of the multimode heterogeneous information network entities is an effective way for improving the identification quality of the multimode heterogeneous information network associated entities, however, research on the aspects in the prior art is less. On one hand, the multimode heterogeneous information network field is less provided with multimode heterogeneous information network formalized description methods facing multimode heterogeneous characteristics, so that the identification of the existing multimode heterogeneous information network associated entities is limited to attribute characteristics such as texts; on the other hand, the existing data mining theory and method are difficult to extract the multi-mode heterogeneous characteristics of mass entities of different networks to the same characteristic space.
(2) Cross-network representation learning research status oriented to associated entity identification
Network Representation Learning (Network Representation Learning), also known as Network/Graph Embedding (Network/Graph Embedding), is one of the research hotspots and frontiers of machine Learning in recent years. Given the ability of network representation learning to represent and infer in vector space, more and more scholars extend network representation learning from a single network to multiple networks, exploring cross-network representation learning models and their application in social network associated user identification and knowledge graph alignment, etc. Most social network associated user identification researches establish a homogeneous single mode network by taking users as nodes and user relationships as edges, and then establish a cross-network representation learning model and method by adopting a graph neural network, deep active learning and the like. Some scholars notice the heterogeneous entities in the social network, and establish the heterogeneous network by taking the heterogeneous entities as nodes and the heterogeneous entity relationship as edges. Wang et al extracts user interests according to user contents, establishes a heterogeneous network with the users and the interests as nodes, and then provides a cross-network user feature representation learning model. Zhou et al establishes a heterogeneous network with entities such as users, locations, postings, pictures, and the like in a social network as nodes and relationships between the entities as sides, establishes a cross-network representation learning model by designing a Meta Path (Meta Path), and completes the identification of associated users. Ye et al uses a graph convolutional network to establish a cross-network edge and node feature representation learning model under a priori associated entities.
Extensibility is a marker that represents learning across networks that can handle large amounts of data. The existing cross-network representation learning method which is experimentally verified in a million-level data set and above uses the distributed learning capability of a Word vector (Word2Vec) model for reference. Word vector model-based meta path which is often required to be designed skillfully for heterogeneous network representation learning[30]While the design of meta-paths relies on domain knowledge and its design complexity increases dramatically with the increase of network entity types and modal relationships. This also makes the learning study less for multi-modal heterogeneous feature oriented distributed representation across networks. If a domain-independent cross-multimode heterogeneous network distributed representation learning model can be designed, the dependence of the existing heterogeneous network distributed representation learning on element path design can be thoroughly solved, and the method can be suitable for single-mode or (and) homogeneous networks and any fields and has universality.
(3) Identification of research status of multi-mode heterogeneous associated entity
Data mining for multi-mode heterogeneous features has become the leading edge of research, however, most research focuses mainly on data mining tasks such as network embedding, personalized recommendation and the like in a single data set. Some studies have preliminarily explored the identification of associated entities to multimode or heterogeneous networks without a priori knowledge. In the field of social networks, Zhang et al propose an unsupervised heterogeneous network associated entity identification method facing two types of heterogeneous entities, namely users and positions. In the traffic field, Nassar et al propose an ISORank-based multimode homogeneous network associated entity identification method. In the field of bioinformatics, Gu et al extend homogeneous network associated entity identification methods to heterogeneous networks using graph staining methods. In the field of electronic commerce, Zhu et al have used Graph Summarization (Graph Summarization) and other methods to identify heterogeneous entities such as manufacturers and commodities. In the knowledge base field, the Shen et al multimode heterogeneous information network is regarded as a field knowledge base, and the problem of entity link of unstructured field texts and the field knowledge base is explored. The multimode heterogeneous associated entities have attracted the attention of many researchers in many fields, however, most of the existing research is still multimode or heterogeneous network design. The identification research of the multimode heterogeneous associated entity is less without prior, and the mass entity characteristics of the multimode heterogeneous network are ignored in many researches.
The multi-modal heterogeneous associated entity recognition is also similar or related to studies of language translation in natural language processing, entity alignment in knowledge base, database record linking, entity matching, named recognition in information retrieval, social network associated user recognition, bipartite graph matching, homogeneous network alignment in biological information, and the like. However, these methods have certain limitations in the identification of the multi-mode heterogeneous information network associated entity, which are specifically expressed as follows:
modeling of multi-modal heterogeneous characteristics is absent. Most of the existing methods are designed with associated entity identification models and methods under single mode or homogeneous scenes oriented to specific fields, multimode or (and) heterogeneous characteristics are not integrated into the existing methods, and the identification quality of the associated entities cannot meet the requirement of multimode heterogeneous information network associated entity identification oriented to whole-process integrated application.
Secondly, the computing power of mass entities is insufficient. Parallel and distributed algorithms in a big data environment are still the public problem of the identification of associated entities in various fields. Many associated entity identification methods cannot process massive data, so that the methods cannot be directly applied to identification of multi-mode heterogeneous information network associated entities with massive entities.
And the dependency of the prior associated entity is strong. Most methods rely on prior associated entities to construct supervised and semi-supervised associated entity recognition models and methods, and the associated entity recognition quality depends on the quality and quantity of the prior associated entities. Moreover, the prior associated entities are difficult to label, and the manual work is heavy. This also limits the applicability of such methods to identification of multimodal heterogeneous information network associated entities.
In summary, the related entity identification research in the prior art mainly focuses on single-mode homogeneous environment, many methods require a priori related entities, and few researches pay attention to multimode characteristics or heterogeneous characteristics in data and develop preliminary exploration. Identification of multimode heterogeneous information network associated entities oriented to multimode heterogeneous characteristics is an important trend of current associated entity identification research; theoretically, the research result can be generalized and applied to the existing single-mode or (and) homogeneous environments and the like, and the method is more universal; in application, the research result can be used for a multi-mode heterogeneous information network, and can also be used for other field data such as a social network, a traffic network, biological information, an electronic commerce system, a knowledge graph and the like.
At present, no multimode heterogeneous information network associated entity identification method oriented to multimode heterogeneous characteristics exists in the prior art.
Disclosure of Invention
The embodiment of the invention provides a multimode heterogeneous information network associated entity identification method oriented to multimode heterogeneous characteristics, which aims to overcome the problems in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme.
A multi-mode heterogeneous associated entity identification method based on cross-network representation learning comprises the following steps:
two multimode heterogeneous information networks:
Figure BDA0002629425610000061
and
Figure BDA0002629425610000062
EAand EBIs a set of entities, RAAnd RBBeing a set of entity relationships, TAAnd TBAs a set of entity types, CAAnd CBFor entity relationship type set, let two entities EAi∈EAAnd EBj∈EB
Based on entity EAiAnd EBjThe random walk path set between the two sets is established by an iterative methodAiAnd EBjTransition probability M of multi-mode relation betweenijTransition the probability M through the multi-modal relationshipijLearning by using an objective function to obtain the entity EAiAnd EBjThe multi-modal heterogeneous eigenvectors of (a);
according to said entity EAiAnd EBjJudging the two entities E by the multi-mode heterogeneous characteristic vectorAiAnd EBjWhether the multi-mode heterogeneous consistency exists or not, and two entities E are also judgedAiAnd EBjWhether attribute consistency and environment consistency exist, when the two entities EAiAnd EBjAnd E, determining the consistency of the multimode heterogeneity, the attribute consistency and the environment consistencyAiAnd EBjIs an associated entity.
Preferably, said entity-based EAiAnd EBjThe entity E is established by an iterative method through a random walk path setAiAnd EBjTransition probability M of multi-mode relation betweenijThe method comprises the following steps:
assuming the relation of | C | different modes in the multimode heterogeneous information network, the multimode relation transfer matrix is expressed by | C | × | C | matrix M, wherein M isijRepresenting relationship type C in a multi-mode heterogeneous networkiTo CjThe transition probability of (2);
in a random walk, if the last node EiBy the relation CxTransfer to current node EjIt is transferred to the next node EkProbability p (E)k|Ei,Ej,CxAnd M) is calculated by the following method:
Figure BDA0002629425610000071
wherein, WijAs entity EiAnd EjWeight of (C)ijIs a relation (E)i,Ej) Type (b) NiAs entity EiSet of neighbor nodes of, Wij=(Ni∩Nj)/(Ni∪Nj) If d isijAs entity EiAnd EjThe distance between them is:
Figure BDA0002629425610000072
acquiring a set of random walk path sets P ═ { P ] by adopting random walks according to formula (2)1,P2,P3… and corresponding multimode transition path T ═ { T ═1,T2,T3… }, wherein
Figure 1
Figure BDA0002629425610000074
Using a vector e of dimension | P |iRepresents a relationship type CiFeatures in a random walk set P, where eijIs represented by CiAt PiThe number of occurrences in (a);
calculating a relationship type C according to the Pearson correlation coefficientiAnd CjOf (2) similarity, i.e.
Figure BDA0002629425610000075
Updating multimode relation transition probability by adopting Sigmoid function
Figure BDA0002629425610000076
Initially, MijThe matrix is set to be an all 1 matrix or a random matrix according to MijAcquiring a random walk path set P by adopting a formula (2), and updating M according to a formula (5)ijContinuously iterating the above process until MijConverging to complete the multi-mode relationship transfer matrix ZijAnd (4) constructing.
Preferably, said transition matrix M through said multi-modal relationshipijLearning by using an objective function to obtain the entity EAiAnd EBjThe multi-modal heterogeneous feature vector of (1), comprising:
the entity EAiAnd EBjRespectively serving as a node, establishing a cross-network distributed representation learning model and an algorithm by using a Skip-Gram model in Word2Vec, and setting a target optimization function of the Skip-Gram model in the cross-network distributed representation learning facing the multi-mode heterogeneous characteristics as follows:
Figure BDA0002629425610000077
where θ is the band solution parameter, Nt(v) A context node of type t in a neighboring node being node V, if VtFor a set of nodes of type t in two networks, then:
Figure BDA0002629425610000081
wherein, XvA multi-mode heterogeneous feature vector of a node v;
obtaining entity E by solving equation (10)AiAnd EBjOf the multi-modal heterogeneous eigenvector XAiAnd XBj
Preferably, said method is based on said entity EAiAnd EBjJudging the two entities E by the multi-mode heterogeneous characteristic vectorAiAnd EBjWhether or not there is multi-modal heterogeneous consistency, including:
according to entity EAiAnd EBjThe multi-mode heterogeneous feature vector judgment entity EAiType TAiAnd EBjType TBjWhether or not they are identical, if so, two entities EAiAnd EBjDegree of identification of type relationship between HijEqual to 1; otherwise, two entities EAiAnd EBjThe type relation identification degree between the two is equal to 0;
Figure BDA0002629425610000082
when two entities EAiAnd EBjWhen the types of (A) are the same, entity EAiAnd EBjBetween the multimode heterogeneous similarity RijThe calculation method comprises the following steps:
Figure BDA0002629425610000083
XAiand XBjTwo entities E obtained for solutionAiAnd EBjThe multi-modal heterogeneous eigenvectors of (A), RijComposition of entity set EAAnd EBAnd a multi-modal heterogeneous feature similarity matrix R therebetween.
Preferably, said determining two entities EAiAnd EBjWhether attribute consistency exists includes:
the entity EAiAnd EBjThe attribute of the entity E comprises a text attribute and a geometric attribute, wherein the text attribute is a short text, a semantic feature vector model of the entity attribute is analyzed and established by adopting a short text word vector method, and the entity E is calculated by cos similarity or Euclidean distance methodAiAnd EBjText attribute feature similarity between them;
fusion entity EAiAnd EBjThe similarity of text attribute features and the similarity of geometric attribute features between form an entity EAiAnd EBjAttribute consistency feature similarity matrix P therebetweenijAll P areijComposition of entity set EAAnd EBThe attribute consistency feature similarity matrix P therebetween.
Preferably, said determining two entities EAiAnd EBjWhether there is environmental consistency, including:
if Z is
Figure BDA0002629425610000091
And
Figure BDA0002629425610000092
in the set of associated entities, entity EAiAnd EBjEnvironmental consistency feature similarity between them YijThe calculation method comprises the following steps:
Figure BDA0002629425610000093
wherein, IAi=NAi∩Z,IBj=NBjN and Z, in the initial stage,
Figure BDA0002629425610000094
as the iterative process continues, more and more associated entities in Z will be present, all YijComposition of entity set EAAnd EBThe environment consistency feature similarity matrix Y between them.
Preferably, said two entities EAiAnd EBjAnd E, determining the consistency of the multimode heterogeneity, the attribute consistency and the environment consistencyAiAnd EBjIs an association entity, comprising:
synthetic entity EAiAnd EBjDegree of identification of type relationship between HijMulti-mode heterogeneous similarity RijEnvironment consistency feature similarity YijAnd attribute consistency feature similarity matrix PijObtaining said entity EAiAnd EBjThe similarity value S betweenij
Sij=sim(EAi,EBj)=Hij·Rij·Yij·Pij
Based on EAAnd EBThe similarity value between all entities in E constitutesAAnd EBThe similarity matrix S between the entities selects the unassociated entity pair E with the maximum similarity value in SAiAnd EBjIs a related entity and needs to satisfy Sij>Tau, tau is a set similarity threshold;
when a new associated entity Δ Z is identified, the associated entity set Z is updated to be: and Z is Z U delta Z, updating Y and S, re-identifying a new associated entity, finishing iteration when the associated entity meeting the requirement cannot be identified, and outputting an identified associated entity set Z.
It can be seen from the technical solutions provided by the embodiments of the present invention that, in the embodiments of the present invention, starting from the important requirements of the full-process integrated application and the full-life cycle data sharing of the multi-mode heterogeneous information network, the identification of the multi-mode heterogeneous information network associated entity under a massive entity is taken as a research target, and on the basis of fully analyzing the multi-mode heterogeneous characteristics of the multi-mode heterogeneous information network, a formal description method of the complex multi-mode heterogeneous information network, a domain-independent distributed representation learning model and method across the multi-mode heterogeneous network, a parallel computing method of the multi-mode heterogeneous information network, and an associated entity identification model and algorithm of comprehensive attribute characteristics and multi-mode heterogeneous characteristics are mainly researched, and experimental verification is performed on massive data.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram illustrating identification of an associated entity of a multimode heterogeneous information network in the prior art;
fig. 2 is a general implementation framework structure diagram of a multimode heterogeneous information network associated entity identification method oriented to multimode heterogeneous characteristics according to an embodiment of the present invention;
fig. 3 is a framework diagram of a cross-network-node multi-mode relationship feature representation learning method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a random walk according to an embodiment of the present invention, in which a multi-modal relationship is considered;
fig. 5 is a schematic diagram of a geometric property similarity calculation process according to an embodiment of the present invention;
fig. 6 is a flowchart illustrating an iterative association entity identification according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
The embodiment of the invention aims at the urgent need of the building industry for the whole-process integrated application of a multimode heterogeneous information Network and the whole life cycle data sharing of construction projects and the Network Representation Learning (Network Representation Learning) leading edge scientific theory, and establishes a multimode heterogeneous information Network associated entity identification model and method based on cross-Network Representation Learning by taking the cooperative coupling of the computer science and the key technology of the building and civil engineering science as a means.
The invention comprehensively considers text and geometric attribute characteristics, multimode heterogeneous characteristics and massive entities of a multimode heterogeneous information network, researches a multimode heterogeneous information network associated entity identification model and a method based on cross-network representation learning by using the theory and the method of network representation learning, and the overall implementation framework structure of the multimode heterogeneous information network associated entity identification method oriented to the multimode heterogeneous characteristics is shown in figure 2.
The method firstly researches a formal description method of the complex multimode heterogeneous information network, converts the multimode heterogeneous information network into the multimode heterogeneous network from the perspective of the complex network, and establishes a model basis for a multimode heterogeneous information network associated entity identification and parallel computing method and the like. Aiming at the multimode heterogeneous characteristics and the mass entities, by establishing a multimode relation transfer model, a cross-network random walk model and a cross-network distributed representation learning model based on word vectors, the multimode heterogeneous characteristics of different network nodes are embedded into the same space low-dimensional continuous vector, and a foundation is established for multimode heterogeneous consistency calculation. Aiming at mass entity characteristics, by establishing a multi-mode heterogeneous consistency model, an environment consistency model and an attribute consistency model and comprehensively considering the attribute characteristics and the multi-mode heterogeneous characteristics of a multi-mode heterogeneous information network, the identification quality of the associated entities of the multi-mode heterogeneous information network is improved, and the applicability of the associated entity identification model is ensured. And finally, carrying out extensive experimental verification by adopting actual engineering data, and ensuring that research results can serve the whole-process integrated application and the whole life cycle data sharing of the multimode heterogeneous information network.
(1) Multi-mode heterogeneous feature analysis and formalization description method of multi-mode heterogeneous information network based on IFC
The invention is supposed to combine IFC data standard, and analyze the multi-mode heterogeneous characteristics of the multi-mode heterogeneous information network from the two aspects of entity attribute characteristics and relationship characteristics. Aiming at the attribute characteristics, the invention aims to adopt a literature research method and an induction summarizing method to summarize the common attribute characteristics and the characteristics of each entity and establish a foundation for the subsequent extraction of the attribute characteristic vectors. Aiming at the relation characteristics, the invention aims to establish entity relation graphs under different modes on the basis of summarizing and summarizing the types and characteristics of the existing relation modes; and then, analyzing the structural characteristics and similarities, including density, degree distribution, radius and the like of each modal relational graph from a large number of actual engineering multimode heterogeneous information networks by adopting a data analysis method, and providing necessary support for theoretical analysis and algorithm improvement of subsequent algorithms.
And then, according to the research results, by using a complex network theory for the purpose of referring to the formal description of the social network, researching a formal description method of the multimode heterogeneous information network. In general, a multi-mode heterogeneous information network
Figure BDA0002629425610000111
May be composed of entities and entity relationships,
Figure 2
wherein E is
Figure BDA0002629425610000113
In the entity set, R is an entity relationship set, T is an entity type set, and C is an entity relationship type set. For any entity E in the multi-mode heterogeneous information networkiWhich includes the attribute characteristics of the entity, the specific attribute characteristics being referenced to the data standard of the IFC. For any two entities EiAnd EjWhere there may be a plurality of different modal relationships, the present invention contemplates the use of RijRepresents EiAnd EjA set of all relationships. For any entity relationship Rijk∈RijIt can be described as: rijk={Ei,Ej,CkIs defined as EiIn relation to CkE C depends on Ej. Thus, is available
Figure BDA0002629425610000121
Description of EiIn relation to CkAll entities that depend. Entity EiMay be of the type TiOr T (E)i) A description will be given.
After formal description, the invention converts the multimode heterogeneous information network model into a multimodeA heterogeneous information network. At this time, the entities are also called nodes, and the relationships are also called edges. The invention is intended to use | · | to represent the number of sets. When RijWhen | < 1, the multi-mode heterogeneous information network degenerates to a heterogeneous information network; when | T | ═ 1, the multimodal heterogeneous information network degenerates to a homogeneous network. Therefore, the research content of the invention has more universality compared with homogeneous and/or single-mode information networks. On the basis, the formal description method of the multi-mode heterogeneous information network is further deepened, so that a basic mathematical model is provided for the establishment of a subsequent multi-mode heterogeneous information network associated entity recognition model, a multi-mode heterogeneous information network parallel computing algorithm and the like, and a model basis is established for the research of other multi-mode heterogeneous information networks.
(2) Domain-independent cross-multimode heterogeneous network distributed representation learning method
Fig. 3 is a framework diagram of a cross-network-node multi-modal relationship feature representation learning method according to an embodiment of the present invention. The cross-network representation learning aims at embedding network features of different network nodes into the same low-dimensional continuous space, and is one of effective methods for calculating the similarity of node network structures in different networks. The introduction of Meta Path (Meta Path) to extend homogeneous network distributed representation learning methods (such as deep walk, LINE and node2vec) to heterogeneous networks is the mainstream method of heterogeneous network distributed representation learning, such as Meta Path2 vec. On one hand, the heterogeneous network distributed representation method based on meta-paths requires sufficient domain knowledge to design reasonable meta-paths, so that it has no universality; on the other hand, the meta-path based method only considers heterogeneous nodes, and does not fully consider the multi-mode relationship. Furthermore, as the number of node types and modality types in the network increases, the design of meta-paths becomes extremely complex.
Partial research explores a cross-network distributed representation learning method under a given certain correlation node; however, it often requires a certain amount of associated nodes and is not adaptable in a multi-mode heterogeneous network. Considering the mass of the multi-mode heterogeneous information network entities, the invention aims to research a domain-independent cross-multi-mode heterogeneous network distributed representation learning method based on a word vector model and establish a foundation for an associated entity identification model under the multi-mode heterogeneous network. As shown in fig. 4, the cross-network representation learning model contains three parts: a multi-modal relational transfer model, a cross-network random walk model, and a cross-network distributed representation learning model based on word vectors.
Multi-mode relation transfer model
The multimode relation transfer model aims to establish multimode relation transfer probability in a multimode heterogeneous network, so that the problems of dependence on professional field knowledge, universality and the like of the conventional meta path-based method are solved. Given the relationship of | C | different modalities in a multimodal heterogeneous information network, the multimodal relationship transition matrix may be represented by | C | × | C | matrix M, where M isijRepresenting relationship type C in a multi-mode heterogeneous networkiTo CjThe transition probability of (2).
Fig. 4 is a schematic diagram of a random walk considering a multi-mode relationship according to an embodiment of the present invention. In a random walk, if the last node EiBy the relation CxTransfer to current node Ej(as shown in FIG. 4), it is transferred to the next node EkThe probability of (c) is:
Figure BDA0002629425610000131
wherein, WijAs entity EiAnd EjWeight of (C)ijIs a relation (E)i,Ej) Type (b) NiAs entity EiIs determined. WijCan be set according to actual conditions, and the invention adopts Wij=(Ni∩Nj)/(Ni∪Nj) And (6) performing calculation. If d isijAs entity EiAnd EjThe distance between them is:
Figure BDA0002629425610000132
formula (2) considers not only the weight relationship between nodes, but also the transition probability relationship between multi-mode relationships, thereby facilitating the embedding of multi-mode relationship features into low-dimensional continuous vectors.
Given matrix M, a set of random walk path sets P ═ P can be obtained using random walks according to equation (2)1,P2,P3… and corresponding multimode transition path T ═ { T ═1,T2,T3… }, wherein
Figure BDA0002629425610000133
Figure BDA0002629425610000134
At this time, a vector e of | P | dimension can be usediRepresents a relationship type CiFeatures in a random walk set P, where eijIs represented by CiAt PiThe number of occurrences in (c).
On the basis, the invention is intended to calculate the relation type C according to the Pearson correlation coefficientiAnd CjOf (2) similarity, i.e.
Figure BDA0002629425610000135
Then, updating the multi-mode relation transfer matrix by adopting Sigmoid function
Figure BDA0002629425610000136
Initially, the M matrix may be set to be an all 1 matrix or a random matrix. And (3) acquiring a random walk path set P by adopting a formula (2) according to the M, and updating the M according to a formula (5). And continuously iterating the process until M converges, and finishing the construction of the multi-mode relation transfer matrix.
On the basis, the invention theoretically demonstrates the convergence of the M iteration process and forms a corresponding algorithm.
② cross-network random walk model
The multi-modal relationship transfer model solves the problem of random walk in a single model considering multi-modal relationships. The cross-network random walk model connects the nodes and relations of different networks in series on a path, which is the key for mapping the node relation characteristics of different networks to the same low-dimensional continuous space.
Given two multimode heterogeneous information networks
Figure BDA0002629425610000141
And
Figure BDA0002629425610000142
two entities E inAi∈EAAnd EBj∈EBThe invention is to define the structural similarity as follows:
Figure BDA0002629425610000143
if | NAiI denotes EAiNumber of neighbors, | EAI and RAModel is expressed respectively |
Figure BDA0002629425610000144
The number of middle entities and the number of relationships, then
Figure BDA0002629425610000145
In the initial state, a node E in a multimode heterogeneous network is randomly selected by a cross-network random walk modelAiAs an initial node for random walks. Then, the following rules are adopted to form a random walk path across the network:
a. acquiring random probability, and if the probability is smaller than a specified threshold epsilon, wandering in the current multimode heterogeneous network; otherwise, the network roams to another multimode heterogeneous network model;
b. when the current multi-mode heterogeneous network is kept to walk, selecting a next walking node by adopting the probability of the formula (2);
c. when switching to another multimode heterogeneous network for wandering, if the current node has a node with known association, the next node of random wandering is the node with known associationConnecting nodes; otherwise, from EAiSwim to the next node EBjThe probability of (c) is:
Figure BDA0002629425610000146
wherein, h (E)Ai,EBj) Is EAiAnd EBjThe calculation method of the attribute similarity is shown in formula (16).
By the above method, a set of sample paths S may be formed that may be used for distributed representation learning across network nodes.
Distributed representation learning model and algorithm for node multi-mode relation characteristics
The Word vector model (Word2Vec) characterizes semantic information of words in a Word vector manner by learning text, i.e., words that are semantically similar are close together in an embedding space by the space. Considering the mass of the multi-mode heterogeneous information network entity, the invention aims to use Skip-Gram model in Word2Vec for reference to establish cross-network distributed representation learning model and algorithm. In a single homogeneous network (the nodes in the network are of the same type and the relations are of the same type, i.e., | T | ═ 1 and | C | ═ 1), the target optimization function of the Skip-Gram model is:
Figure BDA0002629425610000147
where θ is a band solution parameter.
Considering the multimode heterogeneous characteristics of the network, the formula (9) can be extended to the learning of the cross-network distributed representation oriented to the multimode heterogeneous characteristics, and the objective optimization function can be converted into:
Figure BDA0002629425610000151
wherein N ist(v) The type t context node in the adjacent node of the node v. If VtFor a set of nodes of type t in both networks, then
Figure BDA0002629425610000152
Wherein, XvIs the multi-modal heterogeneous eigenvector of node v.
Obtaining entity E by solving equation (10)AiAnd EBjOf the multi-modal heterogeneous eigenvector XAiAnd XBjFor subsequent calculation of multi-modal heterogeneity coherence. The formula (10) considers the multi-mode characteristics of the network through the multi-mode relation transfer matrix M and considers the heterogeneous characteristics of the network through T. Therefore, the feature vector learned by equation (10) embeds the multi-modal heterogeneous features of the network.
The solution operation amount of the formula (10) is large due to a large number of nodes in the network, and the model training complexity is reduced by adopting negative sampling, so that the objective function can be converted into:
Figure BDA0002629425610000153
wherein σ (·) is sigmoid function, NEG is negative sampling edge number. And then training X by adopting a random gradient descent method to obtain the multimode heterogeneous characteristic vector of each node. Many studies have verified that the negative sampling-based Skip-Gram model is applicable to node feature representation learning of ten million levels and above of node networks; therefore, the method can be used for extracting the multimode heterogeneous characteristics of massive entities of the multimode heterogeneous information network.
The invention aims to design a cross-network distributed representation learning algorithm according to the model, and theoretically discuss the complexity of the algorithm, the influence of the hyper-parameter on the model and the like.
(4) Associated entity recognition model and method integrating attribute characteristics and relationship characteristics
In order to improve the quality of the identification of the associated entity without prior, the invention considers that: an entity depends on its surrounding "environment" and can be identified from the surrounding "environment". For this reason, the basic idea of the identification of the associated entity of the invention is: if EAiAnd EBjIs associated withEntities, i.e. EAi=EBjThen E isAiAnd EBjThe following conditions should be satisfied:
a. and (4) multi-modal heterogeneous consistency. EAiAnd EBjIs the same type or the same type of inherited entity, and EAiAnd EBjHave similar multimode heterogeneous characteristics;
b. and (4) consistency of the attributes. EAiAnd EBjShould have similar text and geometric attribute features;
c. and (4) environment consistency. EAiAnd EBjHave a similar "environment"; i.e. NAiAnd NBjMost of the entities in (2) are also associated entities.
|EA|×|EBThe matrix S represents MAAnd MBA similarity matrix of entities. When two entities EAiAnd EBjIs different, the similarity of the two entities is directly set as 0, S ij0. At this point, there is no need to compute entity EAiAnd EBjMulti-modal heterogeneous consistency, environmental consistency, and attribute consistency. If | EA|×|EBThe matrix H represents MAAnd MBThe type relation matrix of (1) is
Figure BDA0002629425610000161
Multi-mode heterogeneous consistency model
After the multi-mode heterogeneous features of the nodes of two different multi-mode heterogeneous networks are embedded into the low-dimensional continuous vectors in the same space, the cosine similarity can be adopted to calculate two nodes EAiAnd EBjFeature vector X ofAiAnd XBjAnd forming a multi-mode heterogeneous consistency model according to the similarity. That is to say that the first and second electrodes,
Figure BDA0002629425610000162
wherein, | EA|×|EBThe matrix R is
Figure BDA0002629425610000163
And
Figure BDA0002629425610000164
a multi-modal heterogeneous feature similarity matrix of the entity. XAiAnd XBjTwo entities E obtained for the solution described aboveAiAnd EBjThe multi-modal heterogeneous eigenvectors of (A), RijComposition of entity set EAAnd EBAnd a multi-modal heterogeneous feature similarity matrix R therebetween.
Environment consistency model
If Z is the set of the associated entities in the two multimode heterogeneous networks, two nodes EAiAnd EBjThe environmental consistency model of (a) can be calculated using the Jaccard similarity. Namely:
Figure BDA0002629425610000165
wherein, IAi=NAi∩Z,IBj=NBjAndu is Z. Without a priori associated entities, initially, with
Figure BDA0002629425610000166
The invention designs an iterative algorithm to mine the associated entities; thus, as the iterative process continues, there are more and more associated entities in Z. All Y areijComposition of entity set EAAnd EBThe environment consistency feature similarity matrix Y between them.
Third, attribute consistency model
The multi-mode heterogeneous information network attribute comprises two forms of text attribute and geometric attribute. The method establishes similarity models for the text attributes and the geometric attributes respectively.
a. And (5) a text attribute feature model. The text attribute of the multimode heterogeneous information network entity is mostly short text. The method adopts a short text word vector method to analyze and establish an entity attribute semantic feature vector model; then, the cos similarity or Euclidean distance is used for the equationMethod calculation entity EAiAnd EBjForm nA×nBAttribute feature similarity matrix P of orderP
b. And (5) a geometric attribute feature model. IFCs support a number of different geometric model types. Specifically, the IFC adopts a model composed of basic graphic primitives such as Curve2D, GeometricSet and GeometricCurveSet description points, lines and surfaces, adopts a surface model and adopts a Solidmodel to describe an entity model; wherein, the SolidModel can be subdivided into various types such as SweptSolid, Brep, CSG, Clipping, advanced SweptSolid, and the like. The multiple kinds and complex citations of the IFC geometric description bring great challenges to the similarity of the geometric attributes of the multimode heterogeneous information network.
Fig. 5 is a schematic diagram of a geometric property similarity calculation process provided in an embodiment of the present invention, where the calculation process includes: firstly, the invention aims to fully utilize the result of the early multi-mode heterogeneous information network lightweight visualization and convert each geometric model type into Brep; then, Brep is converted into a Delaunay triangulation network, and similarity calculation based on the Delaunay triangulation network is further performed. In the aspect of triangulation network similarity calculation, the invention adopts shape distribution similarity to calculate. On the basis, finally forming a similarity matrix P of all entity geometric attributes in the two multimode heterogeneous information networksG
In the identification of the multimode heterogeneous information network associated entity, two entities are associated without the condition that the similarity of all attributes is large; when the similarity value of the text attribute or (and) the geometric attribute is large, the two multimode heterogeneous information network entities have a certain probability as associated entities. Therefore, the invention adopts a Logit regression model to fuse the text attribute and the geometric attribute similarity to form a multi-mode heterogeneous information network entity attribute similarity matrix P,
Figure BDA0002629425610000171
all P are addedijComposition of entity set EAAnd EBThe attribute consistency feature similarity matrix P therebetween.
Associated entity identification method
In order to improve the identification accuracy of the associated entity, the multi-heterogeneous characteristics of the multi-heterogeneous information network are simulated and integrated, the multi-heterogeneous consistency, the attribute consistency and the environment consistency are considered, and an associated entity identification iterative algorithm is designed. Fig. 6 is a flowchart illustrating iterative association entity identification. First, H, R, Y, and P matrices are calculated based on the study contents (2) and (3) and the multi-modal heterogeneous consistency, attribute consistency, and environment consistency models. Then, for two entities EAiAnd EBjAnd calculating the similarity as follows:
Sij=sim(EAi,EBj)=Hij·Rij·Yij·Pij。 (18)
the algorithm will select the unassociated entity pair E with the largest similarity value in SAiAnd EBjIs a related entity and needs to satisfy Sij>τ, τ is a set similarity threshold. When a new associated entity Δ Z is identified, the associated entity set Z is updated to be: z ═ Z @ U Δ Z. Then, Y and S are updated and new associated entities are re-identified. And when the associated entities meeting the requirements cannot be identified, finishing the iteration and outputting the identified associated entity set Z. Considering that in each iteration process, the delta Z only affects part of the content in the Y; therefore, each iteration does not need to update all Y values and S values, and therefore the efficiency of the associated entity identification method is guaranteed.
The invention aims to design a corresponding algorithm on the basis of the above, and theoretically discuss the influence of algorithm complexity and hyperparameters on the associated entity recognition model.
In summary, in the embodiments of the present invention, starting from the important requirements of the full-process integrated application and the full-life cycle data sharing of the multi-mode heterogeneous information network, the identification of the multi-mode heterogeneous information network associated entity under a massive entity is taken as a research target, and on the basis of fully analyzing the multi-mode heterogeneous characteristics of the multi-mode heterogeneous information network, a formal description method of the complex multi-mode heterogeneous information network, a domain-independent learning model and method of distributed representation across the multi-mode heterogeneous network, and an associated entity identification model and algorithm of the comprehensive attribute characteristics and the multi-mode heterogeneous characteristics are mainly researched, and experimental verification is performed on massive data.
The invention forms a set of multimode heterogeneous information network formalized description method and a multimode heterogeneous associated entity identification model and method based on cross-network representation learning, enriches and perfects theories and methods of network representation learning in the field of data mining and associated entity identification, multimode heterogeneous information network in the field of building informatization, promotes the cross fusion of computer science and building and civil engineering schools, and has important theoretical value. The research result promotes the whole-process integrated application and the whole-life-cycle data sharing of the multimode heterogeneous information network, improves the big data application capability and the management decision level of the building industry and enterprises, serves the national important requirement of modernized transformation and upgrading of the building industry, supports the big data construction and the 'whole-cycle management' of smart cities, smart infrastructures, smart people and the like, and has great economic and social benefits.
Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1.一种基于跨网络表示学习的多模异质关联实体识别方法,其特征在于,包括:1. a multimodal heterogeneous associated entity recognition method based on cross-network representation learning, is characterized in that, comprises: 两个多模异质信息网络:
Figure FDA0002932037570000011
Figure FDA0002932037570000012
EA和EB为实体集合,RA和RB为实体关系集合,TA和TB为实体类型集合,CA和CB为实体关系类型集合,设两个实体EAi∈EA和EBj∈EB
Two multimodal heterogeneous information networks:
Figure FDA0002932037570000011
and
Figure FDA0002932037570000012
E A and E B are entity sets, R A and R B are entity relation sets, T A and T B are entity type sets, and C A and C B are entity relation type sets. Let two entities E Ai ∈ E A and E Bj ∈ E B ;
基于实体EAi和EBj之间的随机游走路径集合通过迭代的方法建立EAi和EBj之间的多模关系转移概率Mij,通过所述多模关系转移概率Mij利用目标函数学习得到所述实体EAi和EBj的多模异质特征向量;Based on the random walk path set between entities E Ai and E Bj , the multimodal relationship transition probability M ij between E Ai and E Bj is established by an iterative method, and the multimodal relationship transition probability M ij is learned by using the objective function Obtain the multimodal heterogeneous eigenvectors of the entities E Ai and E Bj ; 根据所述实体EAi和EBj的多模异质特征向量判断所述两个实体EAi和EBj是否具有多模异质一致性,还判断两个实体EAi和EBj是否具有属性一致性和环境一致性,当所述两个实体EAi和EBj同时具有多模异质一致性、属性一致性和环境一致性,则确定EAi和EBj为关联实体;According to the multimodal heterogeneous feature vectors of the entities E Ai and E Bj , it is determined whether the two entities E Ai and E Bj have multi-modal heterogeneity consistency, and also determine whether the two entities E Ai and E Bj have attribute consistency properties and environmental consistency, when the two entities E Ai and E Bj have multimodal heterogeneity consistency, attribute consistency and environmental consistency at the same time, then determine E Ai and E Bj as associated entities; 所述实体EAi和EBj的属性包含文本属性和几何属性,所述文本属性为短文本,采用短文本词向量的方法分析、建立实体属性语义特征向量模型,通过cos相似度或者欧式距离方法计算实体EAi和EBj之间的文本属性特征相似度;The attributes of the entities E Ai and E Bj include text attributes and geometric attributes, and the text attributes are short texts. The method of short text word vectors is used to analyze and establish the entity attribute semantic feature vector model, and the cos similarity or Euclidean distance method is adopted. Calculate the text attribute feature similarity between entities E Ai and E Bj ; 融合实体EAi和EBj之间的文本属性特征相似度和几何属性特征相似度形成实体EAi和EBj之间的属性一致性特征相似度矩阵Pij,将所有的Pij组成实体集合EA和EB之间的属性一致性特征相似度矩阵P。Integrate the text attribute feature similarity and geometric attribute feature similarity between entities E Ai and E Bj to form the attribute consistency feature similarity matrix P ij between entities E Ai and E Bj , and combine all P ij into entity set E Attribute consistency feature similarity matrix P between A and EB.
2.根据权利要求1所述的方法,其特征在于,所述的基于实体EAi和EBj之间的随机游走路径集合通过迭代的方法建立所述实体EAi和EBj之间的多模关系转移概率Mij,包括:2. The method according to claim 1, characterized in that, based on the set of random walk paths between entities E Ai and E Bj , the multiple paths between the entities E Ai and E Bj are established by an iterative method. Modular relationship transition probability M ij , including: 假定多模异质信息网络中有|C|种不同模态的关系,用|C|×|C|矩阵M表示多模关系转移矩阵,其中Mij表示多模异质网络中关系类型Ci到Cj的转移概率;Assuming that there are |C| different modal relationships in the multimodal heterogeneous information network, the |C|×|C| matrix M is used to represent the multimodal relationship transition matrix, where M ij represents the relationship type C i in the multimodal heterogeneous network the transition probability to C j ; 在一次随机游走中,若上一节点Ei通过关系Cx转移到当前节点Ej,其转移到下个节点Ek的概率p(Ek|Ei,Ej,Cx,M)的计算方法为:In a random walk, if the previous node E i is transferred to the current node E j through the relationship C x , the probability p(E k |E i ,E j ,C x ,M) of transferring to the next node E k The calculation method is:
Figure FDA0002932037570000013
Figure FDA0002932037570000013
其中,Wij为实体Ei和Ej的权重,Cij为关系(Ei,Ej)的类型,Ni为实体Ei的邻居节点集合,Wij=(Ni∩Nj)/(Ni∪Nj),若dij为实体Ei和Ej之间的距离,则有:Among them, W ij is the weight of the entities E i and E j , C ij is the type of the relationship (E i , E j ), Ni is the set of neighbor nodes of the entity E i , W ij =(N i ∩N j )/ (N i ∪N j ), if d ij is the distance between entities E i and E j , then:
Figure FDA0002932037570000021
Figure FDA0002932037570000021
根据公式(2)采用随机游走获取一组随机游走路径集合P={P1,P2,P3,…}及其对应的多模关系转移路径T={T1,T2,T3,…},其中
Figure FDA0002932037570000022
Figure FDA0002932037570000023
用|P|维向量ei表示关系类型Ci在随机游走集合P中的特征,其中eij表示Ci在Pi中出现的次数;
According to formula (2), random walk is used to obtain a set of random walk paths P={P 1 , P 2 , P 3 ,...} and their corresponding multimodal relation transfer paths T={T 1 , T 2 , T 3 ,…}, where
Figure FDA0002932037570000022
Figure FDA0002932037570000023
Use |P| dimensional vector e i to represent the characteristics of relation type C i in random walk set P, where e ij represents the number of times C i appears in P i ;
根据Pearson相关系数计算关系类型Ci和Cj的相似度,即The similarity of relationship types C i and C j is calculated according to the Pearson correlation coefficient, namely
Figure FDA0002932037570000024
Figure FDA0002932037570000024
采用Sigmoid函数更新多模关系转移概率Updating Transition Probabilities of Multimodal Relations Using Sigmoid Function
Figure FDA0002932037570000025
Figure FDA0002932037570000025
初始情况下,Mij矩阵设置为全1矩阵或随机矩阵,根据Mij采用公式(2)获取随机游走路径集合P,根据公式(5)更新Mij,不断迭代上述过程,直至Mij收敛,完成多模关系转移矩阵Zij的构建。Initially, the M ij matrix is set to an all-one matrix or a random matrix. According to M ij , formula (2) is used to obtain the random walk path set P, and M ij is updated according to formula (5), and the above process is continuously iterated until M ij converges. , to complete the construction of the multimodal relational transition matrix Z ij .
3.根据权利要求2所述的方法,其特征在于,所述的通过所述多模关系转移矩阵Mij利用目标函数学习得到所述实体EAi和EBj的多模异质特征向量,包括:3. method according to claim 2, is characterized in that, described through described multimodal relation transition matrix Mij utilizes objective function to learn to obtain the multimodal heterogeneous eigenvectors of described entity E Ai and E Bj , including : 将所述实体EAi和EBj分别作为一个节点,利用Word2Vec中的Skip-Gram模型来建立跨网络分布式表示学习模型和算法,设置面向多模异质特征的跨网络分布式表示学习中的Skip-Gram模型的目标优化函数为:Taking the entities E Ai and E Bj as a node respectively, the Skip-Gram model in Word2Vec is used to establish a cross-network distributed representation learning model and algorithm, and the cross-network distributed representation learning oriented to multimodal heterogeneous features is set up. The objective optimization function of the Skip-Gram model is:
Figure FDA0002932037570000026
Figure FDA0002932037570000026
其中,θ为带求解参数,Nt(v)为节点v的邻接节点中类型为t的上下文节点,若Vt为两个网络中类型为t的节点集合,则:Among them, θ is the parameter with solution, N t (v) is the context node of type t in the adjacent nodes of node v, if V t is the set of nodes of type t in the two networks, then:
Figure FDA0002932037570000027
Figure FDA0002932037570000027
其中,Xv为节点v的多模异质特征向量;where X v is the multimodal heterogeneous feature vector of node v; 通过求解公式(10)得到实体EAi和EBj的多模异质特征向量XAi和XBjThe multimodal heterogeneous eigenvectors X Ai and X Bj of the entities E Ai and E Bj are obtained by solving Equation (10).
4.根据权利要求3所述的方法,其特征在于,所述的根据所述实体EAi和EBj的多模异质特征向量判断所述两个实体EAi和EBj是否具有多模异质一致性,包括:4. The method according to claim 3, characterized in that judging whether the two entities E Ai and E Bj have multi-modal heterogeneity according to the multi-modal heterogeneity feature vectors of the entities E Ai and E Bj qualitative consistency, including: 根据实体EAi和EBj的多模异质特征向量判断实体EAi的类型TAi与EBj的类型TBj是否相同,如果相同,则两个实体EAi和EBj之间的类型关系相识度Hij等于1;否则,两个实体EAi和EBj之间的类型关系相识度等于0;According to the multimodal heterogeneous feature vectors of entities E Ai and E Bj , determine whether the type T Ai of the entity E Ai and the type T Bj of E Bj are the same. If they are the same, then the type relationship between the two entities E Ai and E Bj is acquainted The degree H ij is equal to 1; otherwise, the degree of recognition of the type relationship between the two entities E Ai and E Bj is equal to 0;
Figure FDA0002932037570000031
Figure FDA0002932037570000031
当两个实体EAi和EBj的类型相同时,实体EAi和EBj之间的多模异质相似度Rij的计算方法为:When the two entities E Ai and E Bj are of the same type, the calculation method of the multimodal heterogeneous similarity R ij between the entities E Ai and E Bj is:
Figure FDA0002932037570000032
Figure FDA0002932037570000032
XAi和XBj为求解得到的两个实体EAi和EBj的多模异质特征向量,将所有的Rij组成实体集合EA和EB之间的多模异质特征相似度矩阵R。X Ai and X Bj are the multimodal heterogeneous eigenvectors of the two entities E Ai and E Bj obtained from the solution, and all R ij are formed into the multimodal heterogeneous feature similarity matrix R between the entity sets EA and EB .
5.根据权利要求3所述的方法,其特征在于,所述的判断两个实体EAi和EBj是否具有环境一致性,包括:5. The method according to claim 3, wherein the judging whether the two entities E Ai and E Bj have environmental consistency, comprising: 若Z为
Figure FDA0002932037570000033
Figure FDA0002932037570000034
中已关联实体的集合,
Figure FDA0002932037570000035
Figure FDA0002932037570000036
为两个多模异质信息网络;
If Z is
Figure FDA0002932037570000033
and
Figure FDA0002932037570000034
A collection of associated entities in ,
Figure FDA0002932037570000035
and
Figure FDA0002932037570000036
are two multimodal heterogeneous information networks;
则实体EAi和EBj之间的环境一致性特征相似度Yij的计算方法为:Then the calculation method of the environmental consistency feature similarity Y ij between the entities E Ai and E Bj is:
Figure FDA0002932037570000037
Figure FDA0002932037570000037
其中,IAi=NAi∩Z,IBj=NBj∩Z,初始时,
Figure FDA0002932037570000038
随着迭代过程的继续,Z中的关联实体越来越多,将所有的Yij组成实体集合EA和EB之间的环境一致性特征相似度矩阵Y。
Among them, I Ai =N Ai ∩Z, I Bj =N Bj ∩Z, initially,
Figure FDA0002932037570000038
As the iterative process continues, there are more and more associated entities in Z, and all Y ij are formed into an environment consistency feature similarity matrix Y between entity sets EA and EB.
6.根据权利要求5所述的方法,其特征在于,所述的当所述两个实体EAi和EBj同时具有多模异质一致性、属性一致性和环境一致性,则确定EAi和EBj为关联实体,包括:6. The method according to claim 5, wherein, when the two entities E Ai and E Bj have multimodal heterogeneity consistency, attribute consistency and environment consistency at the same time, then determine E Ai and E Bj are associated entities, including: 综合实体EAi和EBj之间的类型关系相识度Hij、多模异质相似度Rij、环境一致性特征相似度Yij和属性一致性特征相似度矩阵Pij得到所述实体EAi和EBj之间的相似度值SijThe entity E Ai is obtained by synthesizing the type relationship recognition degree H ij , the multimodal heterogeneity similarity R ij , the environmental consistency feature similarity Y ij and the attribute consistency feature similarity matrix P ij between the entities E Ai and E Bj The similarity value S ij between EBj and E Bj : Sij=sim(EAi,EBj)=Hij·Rij·Yij·Pij S ij =sim(E Ai ,E Bj )=H ij ·R ij ·Y ij ·P ij 基于EA和EB中所有实体之间的相似度值构成EA和EB之间的相似度矩阵S,选取S中相似度值最大的未关联实体对EAi和EBj为关联实体,且需满足Sij>τ,τ为设定的相似度阈值;Based on the similarity values between all entities in EA and EB , the similarity matrix S between EA and EB is formed, and the unrelated entity pair E Ai and EBj with the largest similarity value in S is selected as the associated entity, And it needs to satisfy S ij >τ, τ is the set similarity threshold; 当识别出新的关联实体ΔZ后,更新关联实体集合Z为:Z=Z∪ΔZ,更新Y和S,重新识别新的关联实体,当无法识别满足要求的关联实体时,迭代结束,输出识别出的关联实体集合Z。When the new associated entity ΔZ is identified, update the associated entity set Z as: Z=Z∪ΔZ, update Y and S, and re-identify the new associated entity. When the associated entity that meets the requirements cannot be identified, the iteration ends, and the output identification The associated entity set Z out of.
CN202010806775.3A 2020-08-12 2020-08-12 A Multimodal Heterogeneous Associated Entity Recognition Method Based on Cross-Network Representation Learning Active CN111931485B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010806775.3A CN111931485B (en) 2020-08-12 2020-08-12 A Multimodal Heterogeneous Associated Entity Recognition Method Based on Cross-Network Representation Learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010806775.3A CN111931485B (en) 2020-08-12 2020-08-12 A Multimodal Heterogeneous Associated Entity Recognition Method Based on Cross-Network Representation Learning

Publications (2)

Publication Number Publication Date
CN111931485A CN111931485A (en) 2020-11-13
CN111931485B true CN111931485B (en) 2021-03-23

Family

ID=73310734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010806775.3A Active CN111931485B (en) 2020-08-12 2020-08-12 A Multimodal Heterogeneous Associated Entity Recognition Method Based on Cross-Network Representation Learning

Country Status (1)

Country Link
CN (1) CN111931485B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507707B (en) * 2020-12-04 2024-12-13 国网江苏省电力有限公司南京供电分公司 Method for analyzing and judging the correlation degree of innovative technologies in different fields of power Internet of Things
CN112836063B (en) * 2021-01-27 2023-06-06 四川新网银行股份有限公司 Method for realizing feature tracing
CN113704566B (en) * 2021-10-29 2022-01-18 贝壳技术有限公司 Identification number body identification method, storage medium and electronic equipment
CN116306936A (en) * 2022-11-24 2023-06-23 北京建筑大学 Knowledge graph embedding method and model based on hierarchical relationship rotation and entity rotation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111083767A (en) * 2019-12-23 2020-04-28 哈尔滨工业大学 A Heterogeneous Network Selection Method Based on Deep Reinforcement Learning

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9576048B2 (en) * 2014-06-26 2017-02-21 International Business Machines Corporation Complex service network ranking and clustering
CN105825430A (en) * 2016-01-08 2016-08-03 南通弘数信息科技有限公司 Heterogeneous social network-based detection method
CN109902203B (en) * 2019-01-25 2021-06-01 北京邮电大学 Network representation learning method and device based on edge random walk
CN110188148B (en) * 2019-05-23 2021-02-02 北京建筑大学 Entity identification method and device facing multimode heterogeneous characteristics
CN110717098B (en) * 2019-09-20 2022-06-24 中国科学院自动化研究所 Meta-path-based context-aware user modeling method and sequence recommendation method
CN110929046B (en) * 2019-12-10 2022-09-30 华中师范大学 Knowledge entity recommendation method and system based on heterogeneous network embedding
CN111291243B (en) * 2019-12-30 2022-07-12 浙江大学 Visual reasoning method for uncertainty of spatiotemporal information of character event
CN111277433B (en) * 2020-01-15 2021-02-12 同济大学 Method and device for network service anomaly detection based on attribute network representation learning
CN111381902B (en) * 2020-03-10 2021-04-13 中南大学 APP startup acceleration method based on heterogeneous network embedding with attributes

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111083767A (en) * 2019-12-23 2020-04-28 哈尔滨工业大学 A Heterogeneous Network Selection Method Based on Deep Reinforcement Learning

Also Published As

Publication number Publication date
CN111931485A (en) 2020-11-13

Similar Documents

Publication Publication Date Title
CN111931485B (en) A Multimodal Heterogeneous Associated Entity Recognition Method Based on Cross-Network Representation Learning
WO2022267976A1 (en) Entity alignment method and apparatus for multi-modal knowledge graphs, and storage medium
Hsu Content-based text mining technique for retrieval of CAD documents
Soibelman et al. Management and analysis of unstructured construction data types
Moosavi et al. Community detection in social networks using user frequent pattern mining
CN108681557B (en) Short text topic discovery method and system based on self-expansion representation and similar bidirectional constraint
CN112463980A (en) Intelligent plan recommendation method based on knowledge graph
CN111325326A (en) A Link Prediction Method Based on Heterogeneous Network Representation Learning
WO2024031933A1 (en) Social relation analysis method and system based on multi-modal data, and storage medium
CN110188148B (en) Entity identification method and device facing multimode heterogeneous characteristics
Yang et al. Co-embedding network nodes and hierarchical labels with taxonomy based generative adversarial networks
CN110851664B (en) A topic-oriented social network node importance evaluation method
CN112765490A (en) Information recommendation method and system based on knowledge graph and graph convolution network
Huang et al. Learning social image embedding with deep multimodal attention networks
CN115587626A (en) Heterogeneous Graph Neural Network Attribute Completion Method
Yin et al. Two-stage Text-to-BIMQL semantic parsing for building information model extraction using graph neural networks
Yin et al. A deep natural language processing‐based method for ontology learning of project‐specific properties from building information models
Lu et al. Organizational graph generation for structured architectural floor plan dataset
Zhang et al. Embedding heterogeneous information network in hyperbolic spaces
Zhou et al. Rank2vec: learning node embeddings with local structure and global ranking
Sun et al. Graph force learning
Sachan et al. Probabilistic model for discovering topic based communities in social networks
Yan et al. Ontology modeling for contract: Using OWL to express semantic relations
CN110765276A (en) Entity alignment method and device in knowledge graph
CN109739991A (en) A Unified Semantic Topic Modeling Method for Modal Heterogeneous Power Data Based on Shared Feature Space

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant