CN113095948A - Multi-source heterogeneous network user alignment method based on graph neural network - Google Patents
Multi-source heterogeneous network user alignment method based on graph neural network Download PDFInfo
- Publication number
- CN113095948A CN113095948A CN202110315041.XA CN202110315041A CN113095948A CN 113095948 A CN113095948 A CN 113095948A CN 202110315041 A CN202110315041 A CN 202110315041A CN 113095948 A CN113095948 A CN 113095948A
- Authority
- CN
- China
- Prior art keywords
- user
- node
- neural network
- network
- social
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000005295 random walk Methods 0.000 claims abstract description 22
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 17
- 239000013598 vector Substances 0.000 claims abstract description 17
- 238000003062 neural network model Methods 0.000 claims abstract description 10
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 230000009467 reduction Effects 0.000 claims abstract description 8
- 238000012545 processing Methods 0.000 claims abstract description 7
- 238000005070 sampling Methods 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 17
- 230000002776 aggregation Effects 0.000 claims description 12
- 238000004220 aggregation Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000012847 principal component analysis method Methods 0.000 claims description 6
- 230000004931 aggregating effect Effects 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 235000008694 Humulus lupulus Nutrition 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000003012 network analysis Methods 0.000 abstract 1
- 238000007418 data mining Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- JFLRKDZMHNBDQS-UCQUSYKYSA-N CC[C@H]1CCC[C@@H]([C@H](C(=O)C2=C[C@H]3[C@@H]4C[C@@H](C[C@H]4C(=C[C@H]3[C@@H]2CC(=O)O1)C)O[C@H]5[C@@H]([C@@H]([C@H]([C@@H](O5)C)OC)OC)OC)C)O[C@H]6CC[C@@H]([C@H](O6)C)N(C)C.CC[C@H]1CCC[C@@H]([C@H](C(=O)C2=C[C@H]3[C@@H]4C[C@@H](C[C@H]4C=C[C@H]3C2CC(=O)O1)O[C@H]5[C@@H]([C@@H]([C@H]([C@@H](O5)C)OC)OC)OC)C)O[C@H]6CC[C@@H]([C@H](O6)C)N(C)C Chemical group CC[C@H]1CCC[C@@H]([C@H](C(=O)C2=C[C@H]3[C@@H]4C[C@@H](C[C@H]4C(=C[C@H]3[C@@H]2CC(=O)O1)C)O[C@H]5[C@@H]([C@@H]([C@H]([C@@H](O5)C)OC)OC)OC)C)O[C@H]6CC[C@@H]([C@H](O6)C)N(C)C.CC[C@H]1CCC[C@@H]([C@H](C(=O)C2=C[C@H]3[C@@H]4C[C@@H](C[C@H]4C=C[C@H]3C2CC(=O)O1)O[C@H]5[C@@H]([C@@H]([C@H]([C@@H](O5)C)OC)OC)OC)C)O[C@H]6CC[C@@H]([C@H](O6)C)N(C)C JFLRKDZMHNBDQS-UCQUSYKYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Economics (AREA)
- Primary Health Care (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a multi-source heterogeneous network user alignment method based on a graph neural network, which comprises the steps of extracting attribute features of users through a text processing algorithm, and reducing the dimension of user attribute feature vectors by using a dimension reduction algorithm; obtaining a network topological structure by using a random walk algorithm, taking the attribute characteristics after dimensionality reduction and the network structure as the input of a graph neural network, and learning to obtain identity characteristics containing user attribute and structure information; calculating the similarity of the user names and the social roles among the cross-networks, and finding out candidate user pairs; and calculating and aligning the identities of the users of the multi-source heterogeneous network by using a neural network model. The method can be used for aligning the user identity of the social network, has important application in the fields of social network analysis, human and object image completion and the like, has low calculation complexity of the algorithm and high expandability, can align the user identity in the complex network, and has strong applicability to real data.
Description
Technical Field
The invention belongs to the field of social media data mining, and particularly relates to a multi-source heterogeneous network user alignment method based on a graph neural network.
Background
In recent years, users have registered to use many online social networks, such as Twitter, Instagram, linkedln, etc., and due to the diversity of functions, different online social network platforms attract users for various purposes, such as information lookup/sharing and maintaining social relationships. For example, a user may use Twitter to post opinions about political events while sharing their leisure activities using Instagram. To better utilize the services provided by each social network, users tend to join multiple online social networks. It has become increasingly popular for users to have accounts (also referred to as user identities) on multiple social networks. According to a report of a social media study, 93% of Instagram users are participating in Facebook at the same time, while 53% of Twitter users also use Instagram.
The method has important significance in aligning the user identities of the multi-source heterogeneous network. As the act of users registering accounts on multiple social media sites has become increasingly popular, new opportunities and challenges have been created for various data mining and learning tasks. First, a user who has accounts on multiple social media websites has the potential to fully understand the interests of the user and provide better suggestions or services. Since users use different online social networks for different purposes, analyzing user identities on a single social media may not be fully aware of their personalities and interests. However, if we can link a person's user identity to multiple data sets, collecting and analyzing their data on these social media websites together, we may have a more comprehensive understanding of the user and provide better services. Second, users with accounts on multiple social media websites can enable us to integrate patterns between online social websites and solve some problems that cannot be solved by data from only one website. Data sparsity issues, for example, in many predictive tasks; for example, a newly established social media service may not have enough historical data to recommend to the user. If we can identify these users on other sophisticated social media websites, knowledge can be transferred from mature social media to new social media, mitigating issues of data sparsity, etc. Finally, users who have accounts on multiple social media websites may also help analyze user migration patterns and guide Web development. Users migrating from one social network to another typically reflect the user experience of Web development. User identity alignment across different social media sites provides a great opportunity to study usage migration behavior.
Although some platforms provide a function of unifying user IDs, many people do not fill in correct information for security. How to align the personal identity of a user in various social networks is a research focus.
The graph neural network is a method for processing graph domain information based on deep learning, and can capture relevant information such as the relationship between vertexes in a graph, a network topological structure and the like. Because of their better performance and interpretability, graphical neural networks have recently become a widely used method of graph analysis.
Disclosure of Invention
Aiming at solving the defects in the prior art, aiming at the problems of user anonymity, content heterogeneity, network diversity and the like in a social network, the invention aims to provide a multisource heterogeneous network user alignment method based on a graph neural network.
The invention is realized by the following technical scheme.
A multi-source heterogeneous network user alignment method based on a graph neural network comprises the following steps:
and 6, splicing the obtained identity characteristics of the user pairs crossing the social networks to serve as input of a neural network, training a multilayer neural network model according to the users with known aligned identities, and aligning the identities of the users in the candidate user pairs.
With respect to the above technical solutions, the present invention has a further preferable solution:
preferably, representing users of the social network as nodes, representing friends or concern relations among the users as edges, and constructing an undirected and unauthorized topological graph; the undirected topology graph is represented as G ═ V, E, where V is the set of nodes in the social network and E is the set of edges in the topology graph G.
The specific method of the step 2 comprises the following steps:
step 2.1, preprocessing the user attribute information, removing fixed labels or punctuation marks, and separating each word through a blank space; removing stop words in the stop phrases provided by the NLTK to finally obtain a word bag document;
step 2.2, calculating to obtain an attribute feature vector of each user by using a TF-IDF algorithm;
and 2.3, reducing the dimension of the attribute characteristics of the user by using a principal component analysis method.
The specific method of the step 3 comprises the following steps:
step 3.1, performing multiple random walks with fixed length on each node in the social network topological graph G, and recording a traversal node sequence;
and 3.2, sampling the first-order neighbors and the second-order neighbors of each node in the social network topological graph G in a fixed quantity, if the number of the actual neighbors of a certain node is less than the number of required samples, adopting a sampling method with a return, and otherwise, adopting a sampling mode without the return.
In the step 3.1, a random walk algorithm is executed on each node in the social network topological graph G, one node is selected as a next hop from neighbors of a current node at a medium probability each time, the step length is t, the random walk times are γ, and finally the mutual relation between the nodes is obtained. The preset length t of the random walk step is 5, and the preset length γ of the random walk times for each node is 50.
In the step 3.2, according to the feature of aggregating neighbors in two hops, the sampling number of the first-order neighbors is k1, and the sampling number of the second-order neighbors is k 2. The preset length k1 of the first order neighbor sample number is 25, and the preset length k2 of the second order neighbor sample number is 15.
The specific method of the step 4 comprises the following steps:
step 4.1, inputting the attribute characteristics after dimension reduction into a two-layer graph neural network graph, carrying out k-th aggregation on the target node by using an average aggregation mode, averaging each dimension of the sampling neighbor feature vector of the target node, splicing the obtained result with the k-1 layer vector of the target vertex, and carrying out nonlinear transformation, wherein the activation function is RELU;
step 4.2, constructing a loss function by using an unsupervised training mode;
and 4.3, after training is finished, obtaining the user identity characteristic of each node, and for the newly added node in the social network, using the attribute information and the neighbor information of the node as input, and obtaining the identity characteristic through fast calculation of a graph neural network.
The specific method of the step 5 comprises the following steps:
step 5.1, calculating the similarity of the user names by using the editing distance, namely the minimum editing times for changing one character string into another character string;
step 5.2, calculating the score of each node in the network by using a PageRank algorithm, sorting according to the score, and expressing the social status of each user by percentage;
and 5.3, selecting the nodes with the user name editing distance within 5 and the social ranking within 25% as candidate user pairs.
The specific method of the step 6 comprises the following steps:
step 6.1, constructing a multilayer neural network model, splicing the identity characteristics of user pairs across the social network, using a two-dimensional vector as input of the neural network model, using [1,0] to represent that two users across the social network are the same person, and using softmax as an activation function to construct a loss function L, wherein [0,1] represents that the two users are not the same person;
step 6.2, taking the user pairs with the aligned identities as positive samples with the aligned identities, and constructing negative samples with the unaligned identities by a random extraction method, wherein the proportion of the positive samples to the negative samples is 1: 1, fitting a model after multiple rounds of training by using a random gradient descent method;
and 6.3, inputting the identity characteristics of the candidate user pairs into the trained neural network, judging whether the candidate user pairs are the same user, and aligning the two social networks.
Due to the adoption of the technical scheme, the invention has the following beneficial effects:
(1) according to the method and the device, the user identity characteristics are extracted through the user name, the attribute information and the social network structure information of the user, so that the user identity between the cross-networks is aligned, the required data are all public data and are easy to obtain, and the method and the device are suitable for various online social networks.
(2) The invention acquires the structural information of the network by random walk, trains the neural network of the graph by an unsupervised method, simultaneously learns the node attribute information and the structural information end to end, acquires the identity characteristics of the user by aggregating the identity characteristics of the multi-layer friends of the user, and fully considers the actual characteristics of the social network.
(3) The graph neural network used by the invention is a direct push type learning mode, a new user is added in the social network, and the graph neural network can be directly obtained by the identity feature aggregation of friends, so that the calculation cost is saved, and the network does not need to be trained repeatedly.
(4) The alignment model is a multilayer neural network, can learn to obtain a special relationship between two networks, and is suitable for any social network.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention:
FIG. 1 is a schematic flow chart of a multi-source heterogeneous network user alignment method based on a graph neural network according to the present invention;
FIG. 2 is a schematic diagram of a node neighbor sampling manner in the neural network;
fig. 3 is an aggregation manner of nodes in the neural network of the figure.
Detailed Description
The present invention will now be described in detail with reference to the drawings and specific embodiments, wherein the exemplary embodiments and descriptions of the present invention are provided to explain the present invention without limiting the invention thereto.
As shown in fig. 1, the multi-source heterogeneous network user alignment method based on the graph neural network provided by the embodiment of the present invention includes the following steps:
step S1: multi-source heterogeneous social network data pre-processing
The online social network data set in the embodiment is derived from https:// www.aminer.cn/cosnet and comprises two parts, namely a social network data set and an academic data set, wherein user alignment data in the social network is collected by Perito et al through a Google resume service system. Selecting two social networks of Twitter and MySpace in a social network data set for experiment, wherein the social network Twitter data set comprises 40,171,624 users and 1,468,365,182 edges and contains 28,199 parts of detailed user data; the social network MySpace dataset includes 854,498 users and 6,489,736 edges, with 9,993 shares of the detailed user profile.
Preprocessing an online social network, defining friends or concern relations between users of the social network, and constructing an undirected and unauthorized social network topological graph. Representing users of the social network as nodes, representing friends or attention relations among the users as edges, and constructing a topological graph G of the undirected weightless graph as (V, E), wherein V is a set of nodes in the social network, and E is a set of edges in the topological graph G.
Step S2: and extracting attribute features of the user through a text processing algorithm, and reducing the dimension through a principal component analysis method for subsequent input. The method specifically comprises the following steps:
step S201: the user attribute information is preprocessed, fixed labels or punctuation marks are removed, each word is separated by a space, for example, the user name of the user is Black-Hawk, the processing result is 1.Black 2.Hawk, the address filled by the user is "Georgia Tech, Atlanta, Georgia", and the processing result is 1.Georgia 2.Tech 3.Atlanta 4. Georgia. In addition, the stop words in the stop phrases provided by the NLTK are removed, and finally the bag-of-words document is obtained.
Step S202: and calculating to obtain the attribute feature vector of each user by using a TF-IDF algorithm.
Step S203: and reducing the dimension of the attribute features of the users by using a principal component analysis method, wherein the attribute features of the users occupy 64-dimensional features.
Step S3: all nodes of the social network topological graph G are subjected to random walk to obtain the relevant relations of the nodes among the social networks, and the relevant neighbors of each node are sampled, and the method specifically comprises the following steps:
step S301: and executing a random walk algorithm on each node in the social network topological graph G, selecting one node from the neighbors of the current node at a medium probability each time as a next hop, recording a traversal node sequence, and finally obtaining the mutual relation between the nodes, wherein the step length is 5, and the random walk times are 50.
Step S302: the used graph neural network mainly considers the characteristic of aggregating neighbors in two hops, a fixed number of sampling is carried out on a first-order neighbor (a directly connected node) and a second-order neighbor (a neighbor) of each node in a social network topological graph G, the sampling number of the first-order neighbor is 25, the sampling number of the second-order neighbor is 15, if the actual neighbor number of a certain node is less than the required sample number, a sampling method with a release is adopted, otherwise, a sampling mode without the release is adopted, for example, as shown in FIG. 2, the sampling number of the first-order neighbor is 3, the sampling number of the second-order neighbor is 4, the sampling nodes of the first-order neighbor of the node 1 are 2,3 and 4, and the sampling nodes of the second-order neighbor are.
Step S4: taking the node attribute features after dimensionality reduction as input, training a graph neural network according to social network structure information obtained by random walk, and extracting identity features containing user attributes and structure information for each node in a topological graph through the graph neural network, wherein the method specifically comprises the following steps:
step S401: inputting the attribute features after dimensionality reduction into a two-layer graph neural network GraphSAGE, and using an average aggregation mode to perform k-th aggregation on a target node, wherein the calculation step of performing k-th aggregation on the target node comprises the steps of averaging each dimensionality of a sampling neighbor feature vector of the target node, splicing the obtained result with the k-1-th layer vector of a target vertex, performing nonlinear transformation, obtaining an activation function RELU, and performing aggregation mode of node identity features as shown in FIG. 3, wherein the features of the nodes in the graph are all k-1-th aggregation results, and the k-th aggregation step of the node 1 is to firstly average the features of neighbors 2,3 and 4 of the node 1, namely [ (2+3+1)/3, (4+ 3)/3, (2+6+3)/3]T=[2,3,3.67]TAnd then the characteristics of the node 1 are spliced with the characteristics to obtain [4,2,5,2,3,3.67 ]]TThen multiplied by the matrix W to be trainedkAnd after the function is activated, the identity characteristic of the node 1 can be obtained.
Step S402: using an unsupervised training approach, a loss function is constructed:
wherein Z isuAn embedded vector is generated for the node u through a model, the node v is a neighbor that the node u reaches through random walk, and ZvFor the embedded vector of node v, σ stands for sigmoid activation function, PnThe probability distribution of negative sampling is shown, Q is the number of negative samples, an Adam optimizer is used for training, and fitting is achieved through a multi-round training graph neural network.
Step S403: and after training is finished, obtaining the user identity characteristic of each node, regarding the newly added node in the social network, taking the attribute information of the node and the neighbor attribute information as input, and obtaining the identity characteristic through calculation of a graph neural network.
Step S5: calculating the similarity of user names and social roles among the cross-social networks, and finding out candidate user pairs according to experience values, wherein the method specifically comprises the following steps:
step S501: the user name similarity is calculated using the edit distance, i.e. the minimum number of edits required to change one string to another. The user name edit distance calculation process is shown in table 1.
TABLE 1
Step S502: and calculating the score of each node in the network by using a PageRank algorithm, sorting according to the scores, and expressing the social status of each user by percentage.
Step S503: and selecting the nodes with the user name editing distance within 5 and the social ranking within 25% as candidate user pairs.
Step S6: splicing the obtained identity characteristics of the user pairs crossing the social network as input of a neural network, training a multilayer neural network model according to the users with known identity alignment, and aligning the identity of the user in the candidate user pairs, specifically comprising the following steps:
step S601: constructing a multi-layer neural network model, splicing the identity characteristics of user pairs between networks to serve as the input of the neural network model, using a two-dimensional vector as the output, [1,0]]Representing two users across the network as the same person, [0,1]Then it means that the two users are not the same person, the activation function uses softmax, constructs the loss function L:where y represents the true result of the input user pair,representing the predicted outcome.
Step S602: the user alignment problem is a two-classification problem, and comprises two types of positive samples (identity aligned samples) and negative samples (identity unaligned samples), wherein the identity aligned user pair is used as the positive sample, the negative sample is constructed by a random extraction method, and the proportion of the positive sample to the negative sample is 1: 1, using a random gradient descent method, wherein the learning rate lambda is 0.001, and the model is fitted after multiple rounds of training.
Step S603: and inputting the identity characteristics of the candidate user pairs into the trained neural network, judging whether the candidate user pairs are the same user, and aligning the two networks.
In order to verify the effect of the user alignment method based on the graph neural network proposed in the present embodiment, the test set in the present embodiment is tested, and the accuracy and the recall rate are used as the evaluation indexes of the model.
The experimental results of this example are as follows:
the accuracy values of the test set stabilized at 0.7796 and the recall stabilized at 0.7248.
The experimental result shows that the user identity across networks can be aligned effectively and accurately by the user alignment method based on the graph neural network. The method can be used for aligning the user identities among a plurality of social networks, has important application in a plurality of fields of social network data analysis, figure portrait missing information completion, social platform commodity recommendation and the like, has low calculation complexity and high expandability of an algorithm, can align the user identities in the complex networks, and is suitable for the plurality of social networks.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.
Claims (10)
1. A multi-source heterogeneous network user alignment method based on a graph neural network is characterized by comprising the following steps:
step 1, preprocessing an online social network, defining friends or concern relations between users of the social network, and constructing an undirected and unauthorized social network topological graph;
step 2, extracting attribute characteristics of the user through a text processing algorithm, and reducing dimensions through a principal component analysis method;
step 3, carrying out random walk sampling on all nodes of the social network topological graph to obtain the correlation of the nodes among the social networks;
step 4, taking the node attribute features after dimensionality reduction as input, training a graph neural network according to social network structure information obtained by random walk, and extracting identity features containing user attributes and structure information for each node in the topological graph through the graph neural network;
step 5, calculating the similarity of the user name and the social role among the cross-social networks, and finding out a candidate user pair according to an empirical value;
and 6, splicing the obtained identity characteristics of the user pairs crossing the social networks to serve as input of a neural network, training a multilayer neural network model according to the users with known aligned identities, and aligning the identities of the users in the candidate user pairs.
2. The multi-source heterogeneous network user alignment method based on the graph neural network is characterized in that the users of the social network are represented as nodes, friends or concern relations among the users are represented as edges, and a undirected and unweighted topological graph is constructed; the undirected topology graph is represented as G ═ V, E, where V is the set of nodes in the social network and E is the set of edges in the topology graph G.
3. The multi-source heterogeneous network user alignment method based on the graph neural network according to claim 1, wherein the specific method in the step 2 is as follows:
step 2.1, preprocessing the user attribute information, removing fixed labels or punctuation marks, and separating each word through a blank space; removing stop words in the stop phrases provided by the NLTK to finally obtain a word bag document;
step 2.2, calculating to obtain an attribute feature vector of each user by using a TF-IDF algorithm;
and 2.3, reducing the dimension of the attribute characteristics of the user by using a principal component analysis method.
4. The multi-source heterogeneous network user alignment method based on the graph neural network according to claim 1, wherein the specific method in the step 3 is as follows:
step 3.1, performing multiple random walks with fixed length on each node in the social network topological graph G, and recording a traversal node sequence;
and 3.2, sampling the first-order neighbors and the second-order neighbors of each node in the social network topological graph G in a fixed quantity, if the number of the actual neighbors of a certain node is less than the number of required samples, adopting a sampling method with a return, and otherwise, adopting a sampling mode without the return.
5. The multi-source heterogeneous network user alignment method based on the graph neural network according to claim 3, characterized in that in step 3.1, a random walk algorithm is executed on each node in the social network topological graph G, one node is selected as a next hop from neighbors of a current node at a medium probability each time, the step length is t, the random walk times are γ, and finally the mutual relationship between the nodes is obtained;
in the step 3.2, according to the feature of aggregating neighbors in two hops, the sampling number of the first-order neighbors is k1, and the sampling number of the second-order neighbors is k 2.
6. The multi-source heterogeneous network user alignment method based on the graph neural network as claimed in claim 5, wherein the preset length t of the random walk step is 5, and the preset length γ of the random walk times for each node is 50.
7. The multi-source heterogeneous network user alignment method based on the graph neural network as claimed in claim 5, wherein the preset length k 1-25 for the first-order neighbor sample number and the preset length k 2-15 for the second-order neighbor sample number.
8. The multi-source heterogeneous network user alignment method based on the graph neural network according to claim 1, wherein the specific method in the step 4 is as follows:
step 4.1, inputting the attribute characteristics after dimension reduction into a two-layer graph neural network graph, carrying out k-th aggregation on the target node by using an average aggregation mode, averaging each dimension of the sampling neighbor feature vector of the target node, splicing the obtained result with the k-1 layer vector of the target vertex, and carrying out nonlinear transformation, wherein the activation function is RELU;
step 4.2, constructing a loss function by using an unsupervised training mode:
wherein Z isuAn embedded vector is generated for the node u through a model, the node v is a neighbor that the node u reaches through random walk, and ZvFor the embedded vector of node v, σ stands for sigmoid activation function, PnIs the probability distribution of negative samples, Q is the number of negative samplesMesh;
and 4.3, after training is finished, obtaining the user identity characteristic of each node, and for the newly added node in the social network, using the attribute information and the neighbor information of the node as input, and obtaining the identity characteristic through fast calculation of a graph neural network.
9. The multi-source heterogeneous network user alignment method based on the graph neural network according to claim 1, wherein the specific method in the step 5 is as follows:
step 5.1, calculating the similarity of the user names by using the editing distance, namely the minimum editing times for changing one character string into another character string;
step 5.2, calculating the score of each node in the network by using a PageRank algorithm, sorting according to the score, and expressing the social status of each user by percentage;
and 5.3, selecting the nodes with the user name editing distance within 5 and the social ranking within 25% as candidate user pairs.
10. The multi-source heterogeneous network user alignment method based on the graph neural network according to claim 1, wherein the specific method in the step 6 is as follows:
step 6.1, constructing a multilayer neural network model, splicing the identity characteristics of user pairs across social networks, using a two-dimensional vector as input of the neural network model, using [1,0] to represent that two users across the social networks are the same person, using [0,1] to represent that the two users are not the same person, using softmax as an activation function, and constructing a loss function L:
Step 6.2, taking the user pairs with the aligned identities as positive samples with the aligned identities, and constructing negative samples with the unaligned identities by a random extraction method, wherein the proportion of the positive samples to the negative samples is 1: 1, fitting a model after multiple rounds of training by using a random gradient descent method;
and 6.3, inputting the identity characteristics of the candidate user pairs into the trained neural network, judging whether the candidate user pairs are the same user, and aligning the two social networks.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110315041.XA CN113095948B (en) | 2021-03-24 | 2021-03-24 | Multi-source heterogeneous network user alignment method based on graph neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110315041.XA CN113095948B (en) | 2021-03-24 | 2021-03-24 | Multi-source heterogeneous network user alignment method based on graph neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113095948A true CN113095948A (en) | 2021-07-09 |
CN113095948B CN113095948B (en) | 2023-06-06 |
Family
ID=76669894
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110315041.XA Active CN113095948B (en) | 2021-03-24 | 2021-03-24 | Multi-source heterogeneous network user alignment method based on graph neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113095948B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113762372A (en) * | 2021-08-30 | 2021-12-07 | 中国电子科技集团公司第十五研究所 | Method and device for identifying organization members in instant messaging information |
CN113779520A (en) * | 2021-09-07 | 2021-12-10 | 中国船舶重工集团公司第七0九研究所 | Cross-space target virtual identity correlation method based on multilayer attribute analysis |
CN114187457A (en) * | 2021-12-16 | 2022-03-15 | 北京大学 | Iterative graph alignment method |
CN114663245A (en) * | 2022-03-16 | 2022-06-24 | 南京信息工程大学 | Cross-social network identity matching method |
CN114817757A (en) * | 2022-04-02 | 2022-07-29 | 广州大学 | Cross-social network virtual identity association method based on graph convolution network |
CN115269845A (en) * | 2022-08-01 | 2022-11-01 | 安徽大学 | Network alignment method and system based on social network user personality |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10305845B1 (en) * | 2013-04-05 | 2019-05-28 | Hrl Laboratories, Llc | Accurate user alignment across online social media platforms |
CN110852755A (en) * | 2019-11-06 | 2020-02-28 | 支付宝(杭州)信息技术有限公司 | User identity identification method and device for transaction scene |
WO2020114022A1 (en) * | 2018-12-04 | 2020-06-11 | 平安科技(深圳)有限公司 | Knowledge base alignment method and apparatus, computer device and storage medium |
CN111931903A (en) * | 2020-07-09 | 2020-11-13 | 北京邮电大学 | Network alignment method based on double-layer graph attention neural network |
CN112084373A (en) * | 2020-08-05 | 2020-12-15 | 国家计算机网络与信息安全管理中心 | Multi-source heterogeneous network user alignment method based on graph embedding |
CN112507247A (en) * | 2020-12-15 | 2021-03-16 | 重庆邮电大学 | Cross-social network user alignment method fusing user state information |
-
2021
- 2021-03-24 CN CN202110315041.XA patent/CN113095948B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10305845B1 (en) * | 2013-04-05 | 2019-05-28 | Hrl Laboratories, Llc | Accurate user alignment across online social media platforms |
WO2020114022A1 (en) * | 2018-12-04 | 2020-06-11 | 平安科技(深圳)有限公司 | Knowledge base alignment method and apparatus, computer device and storage medium |
CN110852755A (en) * | 2019-11-06 | 2020-02-28 | 支付宝(杭州)信息技术有限公司 | User identity identification method and device for transaction scene |
CN111931903A (en) * | 2020-07-09 | 2020-11-13 | 北京邮电大学 | Network alignment method based on double-layer graph attention neural network |
CN112084373A (en) * | 2020-08-05 | 2020-12-15 | 国家计算机网络与信息安全管理中心 | Multi-source heterogeneous network user alignment method based on graph embedding |
CN112507247A (en) * | 2020-12-15 | 2021-03-16 | 重庆邮电大学 | Cross-social network user alignment method fusing user state information |
Non-Patent Citations (3)
Title |
---|
JING ZHANG ET AL: "MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks", 《CIKM\'18》 * |
YIXIN CAO ET AL: "Multi-Channel Graph Neural Network for Entity Alignment", 《ARXIV》 * |
滕磊: "基于异构网络联合嵌入的用户对齐技术研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113762372A (en) * | 2021-08-30 | 2021-12-07 | 中国电子科技集团公司第十五研究所 | Method and device for identifying organization members in instant messaging information |
CN113779520A (en) * | 2021-09-07 | 2021-12-10 | 中国船舶重工集团公司第七0九研究所 | Cross-space target virtual identity correlation method based on multilayer attribute analysis |
CN114187457A (en) * | 2021-12-16 | 2022-03-15 | 北京大学 | Iterative graph alignment method |
CN114663245A (en) * | 2022-03-16 | 2022-06-24 | 南京信息工程大学 | Cross-social network identity matching method |
CN114817757A (en) * | 2022-04-02 | 2022-07-29 | 广州大学 | Cross-social network virtual identity association method based on graph convolution network |
CN114817757B (en) * | 2022-04-02 | 2023-07-21 | 广州大学 | Cross-social network virtual identity association method based on graph rolling network |
CN115269845A (en) * | 2022-08-01 | 2022-11-01 | 安徽大学 | Network alignment method and system based on social network user personality |
CN115269845B (en) * | 2022-08-01 | 2023-06-23 | 安徽大学 | Network alignment method and system based on social network user personality |
Also Published As
Publication number | Publication date |
---|---|
CN113095948B (en) | 2023-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11687728B2 (en) | Text sentiment analysis method based on multi-level graph pooling | |
CN113095948B (en) | Multi-source heterogeneous network user alignment method based on graph neural network | |
CN111061856B (en) | Knowledge perception-based news recommendation method | |
CN112199608B (en) | Social media rumor detection method based on network information propagation graph modeling | |
Fire et al. | Link prediction in social networks using computationally efficient topological features | |
CN111368074A (en) | Link prediction method based on network structure and text information | |
CN112966091B (en) | Knowledge map recommendation system fusing entity information and heat | |
CN112084373B (en) | Graph embedding-based multi-source heterogeneous network user alignment method | |
Wan et al. | A hybrid ensemble learning method for tourist route recommendations based on geo-tagged social networks | |
CN112036445B (en) | User identity recognition method across social networks based on neural tensor network | |
Wang et al. | Multi-modal knowledge graphs representation learning via multi-headed self-attention | |
CN114265986B (en) | Information pushing method and system fusing knowledge graph structure and path semantics | |
CN112925977A (en) | Recommendation method based on self-supervision graph representation learning | |
CN108647800B (en) | Online social network user missing attribute prediction method based on node embedding | |
CN113806630A (en) | Attention-based multi-view feature fusion cross-domain recommendation method and device | |
CN111666496A (en) | Group recommendation method based on comment text | |
CN115270007B (en) | POI recommendation method and system based on mixed graph neural network | |
CN115310005A (en) | Neural network recommendation method and system based on meta-path fusion and heterogeneous network | |
Saraswathi et al. | Deep Learning Enabled Social Media Recommendation Based on User Comments. | |
Liu et al. | Understanding information diffusion with psychological field dynamic | |
Ashraf et al. | Author profiling on bi-lingual tweets | |
CN115687760A (en) | User learning interest label prediction method based on graph neural network | |
Sun et al. | Mapping users across social media platforms by integrating text and structure information | |
CN115600642B (en) | Stream media-oriented decentralization federation learning method based on neighbor trust aggregation | |
CN117272222A (en) | Graph convolution content recommendation method fusing non-genetic recipient social relationship |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |