[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109635989B - Social network link prediction method based on multi-source heterogeneous data fusion - Google Patents

Social network link prediction method based on multi-source heterogeneous data fusion Download PDF

Info

Publication number
CN109635989B
CN109635989B CN201810999492.8A CN201810999492A CN109635989B CN 109635989 B CN109635989 B CN 109635989B CN 201810999492 A CN201810999492 A CN 201810999492A CN 109635989 B CN109635989 B CN 109635989B
Authority
CN
China
Prior art keywords
user
social network
vector
link prediction
heterogeneous data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810999492.8A
Other languages
Chinese (zh)
Other versions
CN109635989A (en
Inventor
周帆
钟婷
吴帮莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201810999492.8A priority Critical patent/CN109635989B/en
Publication of CN109635989A publication Critical patent/CN109635989A/en
Application granted granted Critical
Publication of CN109635989B publication Critical patent/CN109635989B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于多源异构数据融合的社交网络链路预测的方法,利用包含用户关系拓扑图和用户签到记录这两种异构数据源的基于地理位置信息的社交网络进行链路预测。本发明提出一种混合的框架,通过模型AL充分捕获基于地理位置信息的社交网络中用户关系拓扑图和用户签到记录这两种异构数据源之间的关联,克服使用基于地理位置信息的社交网络中单数据源进行链路预测时,预测结果不准确的问题,有效地提升了链路预测的效果。同时应用局部敏感哈希提高深度学习进行训练的计算速度并降低存储开销。

Figure 201810999492

The invention discloses a method for social network link prediction based on multi-source heterogeneous data fusion, which utilizes a social network based on geographic location information including two heterogeneous data sources, a user relationship topology map and a user check-in record, to link links. predict. The present invention proposes a hybrid framework, which fully captures the relationship between the two heterogeneous data sources, the user relationship topology map and the user check-in record, in the social network based on geographic location information through the model AL, and overcomes the use of geographic location information-based social networking. When a single data source in the network performs link prediction, the problem of inaccurate prediction results effectively improves the effect of link prediction. At the same time, locality-sensitive hashing is applied to improve the computational speed of deep learning training and reduce storage overhead.

Figure 201810999492

Description

Social network link prediction method based on multi-source heterogeneous data fusion
Technical Field
The invention belongs to the field of neural Networks in machine learning, and relates to a deep learning-Based method, in particular to a method for fusing two heterogeneous data, namely a user relationship topological graph and a user check-in record in a Social network (LBSN) Based on geographical position information by utilizing deep learning to realize Social network link prediction, and improving the calculation speed of deep learning training and reducing the storage cost by using Local Sensitive Hashing (LSH).
Background
Link Prediction (LP), Link Prediction for short, aims to find out a missing edge or an edge which will appear in the future in a user relationship topological graph formed by friend relationships. With the rapid growth of Social Network Services (SNS) and other Network applications, Network data is ubiquitous. Network data such as friend relationships on APPs such as Facebook and QQ are obtained, a user relationship topological graph can be constructed, and the user relationship topological graph can be used for social network link prediction. Meanwhile, with the development of positioning technology, the GPS positioning function of the mobile equipment can be used for collecting the position information of the user, and the position information can be combined with the positioning time to form a user check-in record. Many studies have shown that user check-in records also contribute to social network link prediction.
The link prediction plays an important role in an information recommendation system, is mainly used in social network analysis, can obtain friends with higher confidence degree through the link prediction, recommends the friends possibly known by a user, can remarkably improve the social experience and loyalty of the user, and brings great economic benefits to enterprises. Besides predicting user association in a user relationship topological graph, the link prediction method and the link prediction idea can be used for predicting the type of a label-free node in a network with known partial node types, and the method and the idea have great value for network recombination and structural function optimization.
In the conventional link prediction method, the similarity between two user nodes is usually measured by using a Jaccard, euclidean distance or cosine value, so as to determine whether the link exists. None of these methods is flexible enough. If a new data set is replaced, or data is added or deleted from the original data set, all data needs to be recalculated, and a large amount of computing and storage resources are consumed. The deep learning can flexibly process mass data. The link prediction model built based on the deep learning method can optimize the parameters of the model by inputting massive training data so as to obtain the trained model for prediction work.
Disclosure of Invention
The LBSN data set used by the invention comprises two data sources with different structures, namely a user relation topological graph and a user check-in record. The user relationship topological graph is formed by the relationship among users, the relationship among the users is called as a link (namely a point pair), and each link is formed by the relationship of two user nodes. The user check-in record is composed of a check-in user node, a check-in Point longitude, a check-in Point latitude, a check-in time and a Point-of-Interest (POI).
The invention aims to solve the problem of inaccurate prediction result when a single data source in LBSN is used for link prediction. The basic idea of the invention is to provide a hybrid framework, which fuses two heterogeneous data, namely a user relationship topological graph and a user check-in record in LBSN (location based service), so as to realize link prediction and enhance the prediction effect of the existing link prediction method. And meanwhile, the LSH is adopted to improve the performance of calculation and storage.
Based on the above invention thought, the invention provides a social network link prediction method based on multi-source heterogeneous data fusion, which comprises the following steps:
s1, Data _ process (g) → Tra, Tes: and extracting a training set Tra and a test set Tes from the user relation topological graph G ═ V, E. Wherein V represents the set of user nodes in the topology graph and E represents the set of edges in the topology graph. If two users u in GiAnd ujIf there is a social relationship, there is an edge between them, denoted as eij=(ui,uj);
S2,
Figure BDA0001782625390000021
Learning and acquiring a social network user vector V from a positive sample G' of Tra by adopting a network representation learning method, and recording the social network user vector V as
Figure BDA0001782625390000022
Wherein d is
Figure BDA0001782625390000023
Dimension (d);
S3,
Figure BDA0001782625390000024
constructing a user-position check-in frequency matrix according to the user check-in record S ═ U, L
Figure BDA0001782625390000025
Wherein U and L respectively represent a user set and a check-in point set in S, N is the number of users in U, and M is the number of check-in points in L. And then obtaining a user access preference vector in a low-dimensional vector space by using Poisson matrix decomposition, and recording the user access preference vector as
Figure BDA0001782625390000026
Wherein D is
Figure BDA0001782625390000027
Dimension (d);
S4,
Figure BDA0001782625390000028
to capture the association of these two types of data sources in the LBSN, in a manner similar to anchor link (anchor link), an improved deep learning model, called AL, was designed.
Figure BDA0001782625390000029
As a sample, a sample of,
Figure BDA00017826253900000210
as the label corresponding to the sample, the two vectors are input into AL together for multi-round training. Generating a new user access preference vector fused with topology information in G using the final trained AL
Figure BDA00017826253900000211
S5,
Figure BDA0001782625390000031
Will be provided with
Figure BDA0001782625390000032
And ui 'vAnd performing fusion again, and inputting the fused signal into a Convolutional Neural Network (CNN) for training. And finally, inputting the Tes into the trained CNN for link prediction to obtain a prediction result.
In the above method for predicting a social network link based on multi-source heterogeneous data fusion, the step S1 is to obtain Tra and Tes. Link prediction can be viewed as a binary problem, with links present in G being viewed as positive samples and links not present being viewed as negative samples. The positive samples in Tra are the user relationship topological graph G' epsilon G missing part of the links, and the missing links are used as the positive samples of Tes. The method specifically comprises the following steps:
s11, data cleaning is carried out, so that users in G and S in LBSN are consistent;
s12, select some links from G as Tes positive samples. Meanwhile, the G' belonging to G after the Tes positive sample is removed from G is ensured to be communicated; taking G' as a positive sample of Tra;
and S13, randomly selecting some nonexistent links from G as negative samples, and distributing the links into Tra and Tes according to a predefined proportion.
In the above method for predicting social network links based on multi-source heterogeneous data fusion, the step S3 is to obtain
Figure BDA0001782625390000033
The method specifically comprises the following steps:
s31, construction of H using S. Wherein the row of H represents the user, the column represents the POI, and the value of H is filled by the number of times that the user accesses the corresponding POI;
s32, performing Poisson matrix decomposition on H to obtain matrix reflecting user access preference
Figure BDA0001782625390000034
And POI feature matrix
Figure BDA0001782625390000035
The POI feature matrix can reflect the condition that a certain POI is visited by a user. U shapesAs a line of
Figure BDA0001782625390000036
In the above method for predicting the social network link based on multi-source heterogeneous data fusion, the step S4 is to capture the association between G and S in the lbs n, so as to implement fusion. In order to capture such a correlation, the training of the model AL specifically comprises the following sub-steps:
s41, utilizing
Figure BDA0001782625390000037
Representing the user nodes in the Tra, and calculating the cosine mean cos of the user point pairs in the Traori
S42, capturing the association between G and S by using the one-to-one correspondence of users in V and U. Mixing the sample
Figure BDA0001782625390000038
And corresponding label
Figure BDA0001782625390000039
Dividing the data into a plurality of batches (batch) and circularly inputting the batches into a Multi Layer Performance (MLP) for training;
and S43, optimizing the parameters in the model AL through multiple rounds of training. After AL is trained, will
Figure BDA0001782625390000041
Input AL, output ui 'v
The implementation method of the model AL described above involves two calculation functions in step S42. The first calculation function is a mapping function for capturing the one-to-one correspondence relationship between users in V and U, and is recorded as
Figure BDA0001782625390000042
The mapping function corresponds to a loss function of
Figure BDA0001782625390000043
Where x represents a sample, y represents a true value, a represents an output value of the model, and n represents the number of samples. A random gradient descent algorithm is called to optimize a global weight parameter W and a global deviation parameter b, and the optimization processes are respectively recorded as
Figure BDA0001782625390000044
Where σ represents the activation function and z is the input to the neuron, expressed as
Figure BDA0001782625390000045
The second calculation function is to ensure that u is generatedi 'vNo offset, i.e. use of ui 'vThe mean value of the cosine of the calculation of the user point pair in the Tra is not less than cosori. Therefore, a cosine mean constraint limit is introduced, noted
Figure 1
Wherein
Figure BDA0001782625390000047
And
Figure BDA0001782625390000048
respectively represent users umAnd user unIs given, and e is present in Gmn. N (U) represents the number of users in U. The cosine mean constraint limits the corresponding loss function to
Figure BDA0001782625390000049
The tuning processes of the global weight parameter W and the global bias parameter b are respectively recorded as
Figure BDA00017826253900000410
Figure BDA00017826253900000411
In the above method for predicting a social network link based on multi-source heterogeneous data fusion, in step S5, G and S are fused again to realize link prediction with low storage consumption and high calculation speed. The method specifically comprises the following steps:
s51, mixing
Figure BDA00017826253900000412
And ui 'vSpliced into a vector
Figure BDA00017826253900000413
S52, applying LSH
Figure BDA0001782625390000051
Projecting into a binary directionQuantity mi∈{0,1}mUp, user uiUsing miRepresents;
s53, for any edge e in Gij=(ui,uj) Obtaining m by the same methodjAs user ujIs represented by (a);
s54, mixing miAnd mjStitching to obtain edge eijIs represented by a binary vector of mij(∈{0,1}2m)=[mi;mj];
S55, converting the vector m with the length of 2mijThe elements in (1) are sequentially filled into a square matrix with the size of n × n in the order of line priority, and the process is called reshaping. Then inputting the square matrix into a Convolutional Neural Network (CNN) for training;
and S56, inputting Tes into the trained CNN for link prediction.
Compared with the prior art, the invention has the following beneficial effects:
1. the method for predicting the social network link based on the multi-source heterogeneous data fusion generates a new user access preference vector fusing the user relationship topological graph by using the model AL similar to the anchor link, and can fully capture the association between the social network user vector and the user access preference vector. The AL can align the social network user vector and the user access preference vector of the same user, and can ensure that a new user access preference vector generated after the user vectors in the two data sources are fused does not shift by introducing a constraint mechanism of cosine mean.
2. The invention discloses a social network link prediction method based on multi-source heterogeneous data fusion, which is characterized in that a social network user vector and a user access preference vector which are respectively obtained by two data sources in an LBSN (location based service) N (location based service) are spliced, the spliced vector is projected onto a binary vector by applying the LSH (least significant Shift H), the binary vector is taken as a final representation vector of a user node in the LBSN, and the final vector of the user node is used for splicing and representing the point-to-point relation according to the point-to-point relation contained in a link in a user relation topological graph. And finally, reshaping the spliced vector into a square matrix, and inputting the square matrix into the CNN to realize link prediction. The application of the LSH can improve the computation speed of deep learning for training and reduce the storage overhead.
Drawings
FIG. 1 is an anchor-like link model AL for capturing associations between social network user vectors and user access preference vectors based on a multi-tier perceptron (MLP).
FIG. 2 is an overall model architecture of a social network link prediction method based on multi-source heterogeneous data fusion. Splicing the new user access preference vector fused with the user relationship topological graph generated in the graph 1 with the social network user vector, projecting the spliced vector to a binary vector by applying an LSH (least squares) algorithm, taking the binary vector as a final representation vector of the user nodes in the LBSN (location based service) N, and splicing the final vector of the user nodes to represent the point-to-point relationship according to the point-to-point relationship contained in the link in the user relationship topological graph. And finally reshaping the spliced vector into a square matrix, and inputting the square matrix into the CNN.
FIG. 3 is a comparison of the performance of the social network link prediction method based on multi-source heterogeneous data fusion when LSH is not used and when LSH is used. Wherein (a) is memory consumption comparison, (b) is CPU consumption comparison, (c) is GPU consumption comparison, and (d) is time consumption comparison. (d) Is the vector dimension input to CNN.
Interpretation of terms
LBSN is an abbreviation for Location-based Social Network, representing a "Location-based Social Network". The LBSN not only contains the contact between people in the traditional social network, but also records the time of the user sign-in, the geographic position and other information.
POI is an abbreviation for Point-of-Interest, representing a "Point of Interest". In lbs n, a POI is a place where a user checks in.
LSH is an abbreviation for local Sensitive Hashing, meaning "Locality Sensitive Hashing". The method is a rapid nearest neighbor search algorithm aiming at massive high-dimensional data.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Examples
The method for predicting the social network link based on the multi-source heterogeneous data fusion can be used for data sets comprising two data sources, namely a user relationship topological graph and a user check-in record. With the real-world LBSN data set shown in Table 1, such as Foursquare (available from Foursquare)http://snap.stanford.eduAcquisition) was performed as an example.
Table 1: relevant information of social link prediction training set for multi-source heterogeneous data fusion
Dataset #check_ins #POIs #edges #users
Foursquare@NYC 22,563 1,992 5,810 588
Foursquare@TKY 38,742 2,212 9,624 1,055
Gowalla@DC 13,594 4,795 5,826 880
Gowalla@CHI 10,314 3,269 2,542 627
Brightkite 75,522 4,038 33,008 1,502
FIG. 1 is an anchor-like link model AL for capturing associations between social network user vectors and user access preference vectors based on a multi-tier perceptron (MLP).
As shown in fig. 1, step S1 is first substituted with the user relationship topology G in Foursquare: data _ process (G) → Tra, Tes, a training set Tra and a test set Tes are acquired. The positive sample G' in Tra is substituted into step S2:
Figure BDA0001782625390000071
wherein the social network user vector can be obtained using a common network learning representation method, such as node2vec
Figure BDA0001782625390000072
Next, step S3 is substituted first using the user check-in record S in Foursquare:
Figure BDA0001782625390000073
obtaining a user access preference vector
Figure BDA0001782625390000074
The outputs of steps S2 and S3
Figure BDA0001782625390000075
And
Figure BDA0001782625390000076
input into model AL, using step S4:
Figure BDA0001782625390000077
obtaining ui 'v
Table 2: effect of social link prediction on three real data sets
Figure BDA0001782625390000078
FIG. 2 is an overall model architecture of a social network link prediction method based on multi-source heterogeneous data fusion.
As shown in FIG. 2, the outputs of steps S2 and S4
Figure BDA0001782625390000081
And ui 'vInput into the prediction model CNN, using step S5:
Figure BDA0001782625390000082
and acquiring a final prediction result. The link prediction effect of the hybrid model is shown in table 2.
# check _ ins indicates the number of user check-in records;
# POIs indicates the number of different POIs in the user check-in record;
# edges indicates the number of links in the user relationship topology;
# users represents the number of users in the user relationship topology graph (or user check-in record);
foursquare @ NYC represents data in the data set Foursquare with the area of New York City;
foursquare @ TKY represents data in the data set Foursquare with the region of Tokyo;
gowalla @ DC represents data in the data set Gowalla with regions of Washington;
gowalla @ CHI represents data in the data set Gowalla with the region Chicago;
vec2 link-is a method of social network link prediction based on multi-source heterogeneous data fusion without using LSH;
vec2link + is a social network link prediction method based on multi-source heterogeneous data fusion and using LSH, and compared with vec2link-, the method embodies the advantages that after LSH is used, storage occupation is low, and calculation speed is increased;
avage, Hadamard, Weighted-L1, Weighted-L2 are comparative methods of vec2link +, which only use the information of the user relationship topology map in LBSN, and the implementation scheme can be referred to the literature [ Grover, Aditya, and J Leskovec ] "node2vec: Scalable features learning for networks." Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and timing.
Jaccard is a vec2link + comparison method for comparing similarity and difference between finite sample sets;
walk2friend is a comparison method of vec2link +, which uses only data recorded by user check-in lbs n, and the implementation can be found in references [ backs, Michael, et al ] "Walk2friends: involved Social Links from Mobility profiles." Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications security.acm,2017 ].
From the test results in table 2, it can be seen that the prediction effect of the method for predicting the social network link based on the multi-source heterogeneous data fusion is comprehensively superior to that of the method for predicting the link only using a single data source.
Therefore, the method can effectively overcome the problem of inaccurate prediction result when the single data source in the LBSN is used for link prediction, and realizes the improvement of the link prediction effect. The invention adopts a deep learning method to fuse two heterogeneous data sources of a user relationship topological graph and a user check-in record in the LBSN to realize link prediction. And meanwhile, the LSH is adopted, and a discrete binary vector represents a user node, so that the calculation speed of the model is accelerated, and the storage overhead is saved. The invention achieves the purposes of high calculation speed, less storage consumption and better link prediction effect than single-source data prediction effect.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (5)

1.一种基于多源异构数据融合的社交网络链路预测的方法,其特征在于包括以下步骤:1. a method for predicting social network links based on multi-source heterogeneous data fusion, is characterized in that comprising the following steps: S1,
Figure 657393DEST_PATH_IMAGE001
:从用户关系拓扑图
Figure 974105DEST_PATH_IMAGE002
中提取出训练集Tra和测试集Tes;其中V表示拓扑图中用户节点的集合,E表示拓扑图中边的集合;若G中的两个用户u i u j 存在社交关系,则他们之间存在一条边,记为e ij =(u i ,u j );
S1,
Figure 657393DEST_PATH_IMAGE001
: From the user relationship topology map
Figure 974105DEST_PATH_IMAGE002
The training set Tra and test set T es are extracted from ; where V represents the set of user nodes in the topology graph, and E represents the set of edges in the topology graph; if two users u i and u j in G have a social relationship, then There is an edge between them, denoted as e ij =( u i , u j );
S2,
Figure 305861DEST_PATH_IMAGE003
:采用网络表示学习方法,从Tra的正样本G′中学习并获取V的社交网络用户向量,记为
Figure 795748DEST_PATH_IMAGE004
,其中d
Figure 60507DEST_PATH_IMAGE005
的维度;
S2,
Figure 305861DEST_PATH_IMAGE003
: Using the network representation learning method, learn and obtain the social network user vector of V from the positive sample G ′ of Tra , denoted as
Figure 795748DEST_PATH_IMAGE004
, where d is
Figure 60507DEST_PATH_IMAGE005
dimension;
S3,
Figure 485803DEST_PATH_IMAGE006
:根据用户签到记录S=(U,L),构建用户-位置签到频率矩阵
Figure 101592DEST_PATH_IMAGE007
;其中UL分别表示S中的用户集合和签到点集合,NU中的用户数量,ML中的签到点数量;再利用泊松矩阵分解获得在低维向量空间中的用户访问偏好向量,记为
Figure 395170DEST_PATH_IMAGE008
,其中D
Figure 791734DEST_PATH_IMAGE009
的维度;
S3,
Figure 485803DEST_PATH_IMAGE006
: According to the user check-in record S = ( U , L ), construct a user-location check-in frequency matrix
Figure 101592DEST_PATH_IMAGE007
; where U and L represent the user set and check-in point set in S respectively, N is the number of users in U , M is the number of check-in points in L ; then use Poisson matrix decomposition to obtain user access in low-dimensional vector space preference vector, denoted as
Figure 395170DEST_PATH_IMAGE008
, where D is
Figure 791734DEST_PATH_IMAGE009
dimension;
S4,
Figure 122352DEST_PATH_IMAGE010
:为了捕获LBSN中GS这两类数据源的关联,采用锚链接anchor link的方式,设计出一个改进的深度学习模型,称为AL;
Figure 225438DEST_PATH_IMAGE009
作为样本,
Figure 322707DEST_PATH_IMAGE011
作为样本对应的标签,两种向量一起输入到AL中进行多轮训练;利用最终训练好的AL生成融合了G中拓扑信息的新的用户访问偏好向量
Figure 765320DEST_PATH_IMAGE012
S4,
Figure 122352DEST_PATH_IMAGE010
: In order to capture the association between the two types of data sources G and S in LBSN, an improved deep learning model called AL is designed by using the anchor link method;
Figure 225438DEST_PATH_IMAGE009
As a sample,
Figure 322707DEST_PATH_IMAGE011
As the labels corresponding to the samples, the two vectors are input into the AL together for multiple rounds of training; the final trained AL is used to generate a new user access preference vector that incorporates the topology information in G
Figure 765320DEST_PATH_IMAGE012
;
S5,
Figure 657053DEST_PATH_IMAGE013
:将
Figure 247434DEST_PATH_IMAGE011
Figure 758181DEST_PATH_IMAGE014
进行再次融合,输入到一个卷积神经网络Convolutional Neural Network、简称CNN,进行训练;最终将Tes输入到训练好的CNN中进行链路预测,获取预测结果result
S5,
Figure 657053DEST_PATH_IMAGE013
:Will
Figure 247434DEST_PATH_IMAGE011
and
Figure 758181DEST_PATH_IMAGE014
Perform re-integration and input it to a convolutional neural network Convolutional Neural Network, or CNN for short, for training; finally, input T es into the trained CNN for link prediction, and obtain the prediction result .
2.根据权利要求1所述基于多源异构数据融合的社交网络链路预测的方法,其特征在于所述步骤S1包括以下分步骤:2. the method for social network link prediction based on multi-source heterogeneous data fusion according to claim 1, is characterized in that described step S1 comprises following substep: S11,进行数据清洗,使得LBSN中GS中的用户保持一致;S11, perform data cleaning, so that the users in G and S in the LBSN are consistent; S12,从G中选择出一些链路作为Tes正样本;同时保证从G中去除掉Tes正样本后的
Figure 179935DEST_PATH_IMAGE015
是连通的;将G′作为Tra的正样本;
S12, select some links from G as T es positive samples; at the same time ensure that the T es positive samples are removed from G
Figure 179935DEST_PATH_IMAGE015
is connected; take G ′ as a positive sample of Tra ;
S13,从G中随机选择一些不存在的链路作为负样本,按预定义的比例分配到Tra和Tes中。S13, randomly select some non-existing links from G as negative samples, and allocate them to Tra and T es according to a predefined ratio .
3.根据权利要求1所述基于多源异构数据融合的社交网络链路预测的方法,其特征在于所述步骤S3包括以下分步骤:3. the method for social network link prediction based on multi-source heterogeneous data fusion according to claim 1, is characterized in that described step S3 comprises following substep: S31,利用S构建H;其中,H的行表示用户,列表示POI,H的值由用户访问对应的POI的次数填充;S31, utilizes S to construct H ; Wherein, the row of H represents the user, and the column represents POI, and the value of H is filled by the number of times the user visits the corresponding POI; S32,对H进行泊松矩阵分解,得到反映用户访问偏好的矩阵
Figure 180252DEST_PATH_IMAGE016
和POI特征矩阵
Figure 195613DEST_PATH_IMAGE017
;POI特征矩阵能反映某一POI被用户访问的情况;
Figure 634685DEST_PATH_IMAGE018
的行作为
Figure 848628DEST_PATH_IMAGE009
S32, perform Poisson matrix decomposition on H to obtain a matrix reflecting the user's visit preference
Figure 180252DEST_PATH_IMAGE016
and POI feature matrix
Figure 195613DEST_PATH_IMAGE017
; The POI feature matrix can reflect the situation that a POI is accessed by users;
Figure 634685DEST_PATH_IMAGE018
the line as
Figure 848628DEST_PATH_IMAGE009
.
4.根据权利要求1所述基于多源异构数据融合的社交网络链路预测的方法,其特征在于所述步骤S4包括以下分步骤:4. the method for social network link prediction based on multi-source heterogeneous data fusion according to claim 1, is characterized in that described step S4 comprises following substep: S41,利用
Figure 957530DEST_PATH_IMAGE019
表示出Tra中的用户节点,计算Tra中用户点对的余弦均值cos ori
S41, using
Figure 957530DEST_PATH_IMAGE019
Indicates the user nodes in Tra , and calculates the cosine mean cos ori of the pairs of user points in Tra;
S42,利用VU中用户的一一对应关系来捕获GS之间的关联;将样本
Figure 256924DEST_PATH_IMAGE009
及对应标签
Figure 234107DEST_PATH_IMAGE011
划分为多个批次batch并循环输入到多层感知机Multilayer perception、简称MLP中进行训练;
S42, use the one-to-one correspondence between users in V and U to capture the association between G and S ;
Figure 256924DEST_PATH_IMAGE009
and corresponding labels
Figure 234107DEST_PATH_IMAGE011
Divide it into multiple batches and input them into the Multilayer Perceptron, or MLP for short, for training;
S43,通过多轮训练实现对模型AL中的参数的调优;AL训练好后,将
Figure 505820DEST_PATH_IMAGE009
输入AL,输出
Figure 847939DEST_PATH_IMAGE014
S43, the parameters in the model AL are optimized through multiple rounds of training; after the AL is trained, the
Figure 505820DEST_PATH_IMAGE009
input AL, output
Figure 847939DEST_PATH_IMAGE014
.
5.根据权利要求1所述基于多源异构数据融合的社交网络链路预测的方法,其特征在于所述步骤S5包括以下分步骤:5. the method for social network link prediction based on multi-source heterogeneous data fusion according to claim 1, is characterized in that described step S5 comprises following substep: S51,将
Figure 696947DEST_PATH_IMAGE020
Figure 149925DEST_PATH_IMAGE014
拼接成一个向量
Figure 541723DEST_PATH_IMAGE021
S51, will
Figure 696947DEST_PATH_IMAGE020
and
Figure 149925DEST_PATH_IMAGE014
concatenated into a vector
Figure 541723DEST_PATH_IMAGE021
;
S52,应用LSH将
Figure 117061DEST_PATH_IMAGE022
投影到一个二进制向量
Figure 594310DEST_PATH_IMAGE023
上,用户u i 使用m i 表示;
S52, applying LSH will
Figure 117061DEST_PATH_IMAGE022
Project to a binary vector
Figure 594310DEST_PATH_IMAGE023
above , user ui is represented by mi ;
S53,对于G中的任一条边e ij =(u i ,u j ),采用相同方法获取m j 作为用户u j 的表示;S53, for any edge e ij =( u i , u j ) in G , adopt the same method to obtain m j as the representation of user u j ; S54,将m i m j 拼接以获取边e ij 的二进制向量表示
Figure 850979DEST_PATH_IMAGE024
S54, concatenate m i and m j to obtain the binary vector representation of edge e ij
Figure 850979DEST_PATH_IMAGE024
;
S55,将长度为2m的向量m ij 中的元素按行优先的顺序依次填入一个大小为n×n的方阵中,这个过程称为重塑;然后将这个方阵输入到一个卷积神经网络Convolutional NeuralNetwork、简称CNN,进行训练;S55, fill the elements in the vector m ij with a length of 2 m into a square matrix of size n × n in row-first order, this process is called reshaping; then input this square matrix into a convolution Neural network Convolutional NeuralNetwork, referred to as CNN, for training; S56,将Tes输入到训练好的CNN中进行链路预测。S56, input Te into the trained CNN for link prediction.
CN201810999492.8A 2018-08-30 2018-08-30 Social network link prediction method based on multi-source heterogeneous data fusion Active CN109635989B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810999492.8A CN109635989B (en) 2018-08-30 2018-08-30 Social network link prediction method based on multi-source heterogeneous data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810999492.8A CN109635989B (en) 2018-08-30 2018-08-30 Social network link prediction method based on multi-source heterogeneous data fusion

Publications (2)

Publication Number Publication Date
CN109635989A CN109635989A (en) 2019-04-16
CN109635989B true CN109635989B (en) 2022-03-29

Family

ID=66066288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810999492.8A Active CN109635989B (en) 2018-08-30 2018-08-30 Social network link prediction method based on multi-source heterogeneous data fusion

Country Status (1)

Country Link
CN (1) CN109635989B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112994940B (en) * 2019-05-29 2022-10-18 华为技术有限公司 Network anomaly detection method and device
CN110335165B (en) * 2019-06-28 2021-03-30 京东数字科技控股有限公司 Link prediction method and device
CN110442802B (en) * 2019-08-06 2022-10-28 中国科学技术大学 A multi-behavioral preference prediction method for social users
CN110543943B (en) * 2019-09-10 2022-03-25 北京百度网讯科技有限公司 Network convergence method and device, electronic equipment and storage medium
CN110598130B (en) * 2019-09-30 2022-06-24 重庆邮电大学 A movie recommendation method integrating heterogeneous information network and deep learning
CN111209943B (en) * 2019-12-30 2020-08-25 广州高企云信息科技有限公司 Data fusion method and device and server
CN111476673A (en) * 2020-04-02 2020-07-31 中国人民解放军国防科技大学 Method, device and medium for user alignment between social networks based on neural network
CN111475739B (en) * 2020-05-22 2022-07-29 哈尔滨工程大学 Heterogeneous social network user anchor link identification method based on meta-path
CN112446542B (en) * 2020-11-30 2023-04-07 山西大学 Social network link prediction method based on attention neural network
CN112569608B (en) * 2020-12-22 2022-03-25 内蒙古工业大学 Table game hybrid recommendation method based on multi-source heterogeneous data
CN112700056B (en) * 2021-01-06 2023-09-15 中国互联网络信息中心 Complex network link prediction methods, devices, electronic equipment and media
CN113298321B (en) * 2021-06-22 2022-03-11 深圳市查策网络信息技术有限公司 User intention prediction method based on multi-data fusion
CN115145991B (en) * 2022-08-31 2022-11-15 南京三百云信息科技有限公司 Data processing method and system suitable for heterogeneous data
CN116206453B (en) * 2023-05-05 2023-08-11 湖南工商大学 Traffic flow prediction method and device based on transfer learning and related equipment
CN117312281B (en) * 2023-06-30 2024-05-24 江苏中科西北星信息科技有限公司 Automatic fusion method, system, equipment and storage medium for multi-source heterogeneous data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503859A (en) * 2016-10-28 2017-03-15 国家计算机网络与信息安全管理中心 A kind of message propagation prediction method and device based on online social relation network
CN107784124A (en) * 2017-11-23 2018-03-09 重庆邮电大学 A kind of LBSN super-networks link Forecasting Methodology based on time-space relationship

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9171319B2 (en) * 2012-03-28 2015-10-27 Fifth Street Finance Corp., As Agent Analysis system and method used to construct social structures based on data collected from monitored web pages
US10395179B2 (en) * 2015-03-20 2019-08-27 Fuji Xerox Co., Ltd. Methods and systems of venue inference for social messages

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503859A (en) * 2016-10-28 2017-03-15 国家计算机网络与信息安全管理中心 A kind of message propagation prediction method and device based on online social relation network
CN107784124A (en) * 2017-11-23 2018-03-09 重庆邮电大学 A kind of LBSN super-networks link Forecasting Methodology based on time-space relationship

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Predicting POI visits with a heterogeneous information network;Zih-Syuan Wang等;《2015 Conference on Technologies and applications of artificial intelligence》;20160215;全文 *
Transferring heterogeneous links across location-based social networks;Jiawei Zhang等;《Proceedings of the 7th ACM international conference on Web search and data mining》;20140224;全文 *
基于多源异构数据融合的社交网络链路数据预测研究;吴帮莹;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20191215(第12期);第I139-95页 *

Also Published As

Publication number Publication date
CN109635989A (en) 2019-04-16

Similar Documents

Publication Publication Date Title
CN109635989B (en) Social network link prediction method based on multi-source heterogeneous data fusion
US9083757B2 (en) Multi-objective server placement determination
US11063881B1 (en) Methods and apparatus for network delay and distance estimation, computing resource selection, and related techniques
CN109948066B (en) Interest point recommendation method based on heterogeneous information network
CN110119475A (en) A kind of POI recommended method and recommender system
US9501509B2 (en) Throwaway spatial index structure for dynamic point data
CN103442331A (en) Terminal equipment position determining method and terminal equipment
CN115293919B (en) Graph neural network prediction method and system for out-of-distribution generalization of social networks
CN114357105A (en) Pre-training method and model fine-tuning method of geographic pre-training model
CN107220312A (en) A kind of point of interest based on co-occurrence figure recommends method and system
CN110874437A (en) A personalized POI recommendation method based on the ranking of multiple POI pairs
Xin et al. A location-context awareness mobile services collaborative recommendation algorithm based on user behavior prediction
Wang et al. Connecting the hosts: Street-level IP geolocation with graph neural networks
Liu et al. LightTR: A lightweight framework for federated trajectory recovery
CN112069416B (en) Cross-social network user identity recognition method based on community discovery
CN111738447B (en) A mobile social network user relationship inference method based on spatiotemporal relationship learning
Xhafa et al. Modeling and processing for next-generation Big-Data technologies
Huang et al. Fine-grained spatio-temporal distribution prediction of mobile content delivery in 5G ultra-dense networks
CN110059795A (en) A kind of mobile subscriber's node networking method merging geographical location and temporal characteristics
CN115242868B (en) A street-level IP address location method based on graph neural network
WO2023226819A1 (en) Data matching method and apparatus, readable medium, and electronic device
CN114219581A (en) A personalized interest point recommendation method and system based on heterogeneous graph
Li et al. Realizing fine-grained inference of AS path with a generative measurable process
Park et al. ActiveDBC: learning Knowledge-based Information propagation in mobile social networks
Kralj et al. Semi-decentralized Training of Spatio-Temporal Graph Neural Networks for Traffic Prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant