CN118071400A - Application method and system based on graph computing technology in information consumption field - Google Patents
Application method and system based on graph computing technology in information consumption field Download PDFInfo
- Publication number
- CN118071400A CN118071400A CN202410110803.6A CN202410110803A CN118071400A CN 118071400 A CN118071400 A CN 118071400A CN 202410110803 A CN202410110803 A CN 202410110803A CN 118071400 A CN118071400 A CN 118071400A
- Authority
- CN
- China
- Prior art keywords
- user
- graph
- recommendation
- algorithm
- technology
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000005516 engineering process Methods 0.000 title claims abstract description 46
- 230000006399 behavior Effects 0.000 claims abstract description 98
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 77
- 238000004364 calculation method Methods 0.000 claims abstract description 27
- 238000004458 analytical method Methods 0.000 claims abstract description 17
- 238000007621 cluster analysis Methods 0.000 claims abstract description 9
- 238000010276 construction Methods 0.000 claims abstract description 9
- 239000013598 vector Substances 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 20
- 238000010586 diagram Methods 0.000 claims description 17
- 230000003993 interaction Effects 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 claims description 8
- 238000005065 mining Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 6
- 230000002452 interceptive effect Effects 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 4
- 238000012544 monitoring process Methods 0.000 claims description 3
- 230000003997 social interaction Effects 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims description 2
- 238000005295 random walk Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 13
- 238000000605 extraction Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 5
- 238000003062 neural network model Methods 0.000 description 5
- 230000002776 aggregation Effects 0.000 description 4
- 238000004220 aggregation Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000003542 behavioural effect Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000003064 k means clustering Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000004140 cleaning Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 229940050561 matrix product Drugs 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0203—Market surveys; Market polls
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Biomedical Technology (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Probability & Statistics with Applications (AREA)
- Primary Health Care (AREA)
- Tourism & Hospitality (AREA)
- Human Resources & Organizations (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Game Theory and Decision Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an application method and a system in the field of information consumption based on graph computing technology, wherein the method comprises the following steps: 1) dynamic user behavior graph construction, 2) community discovery algorithm based on graph embedding technology, 3) recommendation algorithm based on cluster analysis, and 4) personalized recommendation service; the method for constructing the dynamic user behavior graph based on the graph calculation technology can fully mine the time sequence relation between the user behaviors, and improves the accuracy and the instantaneity of a recommendation system; the community finding algorithm based on the graph embedding technology is used, and the user group with similar interests or behaviors can be more accurately identified by using the community finding algorithm, so that a more accurate recommendation basis is provided for a recommendation system. The invention uses a recommendation algorithm based on cluster analysis, and the algorithm utilizes a graph analysis technology to deeply mine and analyze the relationship between the user and the commodity through the characteristics of the path length, the weight and the like.
Description
Technical Field
The present invention relates to information technology, computer software and application fields, and in particular, to a method and a system for applying a graph-based computing technology to information consumption fields.
Background
With the vigorous development of big data and the Internet of things, the richness of the data is increased, the relevance among the data is increased, the traditional analysis of small data volume, single dimension and static data does not meet the requirement of time development, the rapid increase of the data volume and the effective analysis and processing of complex relevance among the data become pain points of the database industry.
At present, a graph database and a graph computing system (also called a graph computing engine) are core contents in the technical field of graph computing, the graph database is mainly responsible for operations such as adding, deleting, checking, modifying and the like of graph data, and the graph computing engine is mainly responsible for performing deep analysis processing on the graph data. The graph calculation technology can effectively process complex association relations, so that the graph calculation technology has wide application prospects in the field of information consumption. For example, a knowledge graph is a typical application, and can help enterprises organize and use data, so as to realize real-time data analysis, hidden relation discovery and contextualization accurate decision.
And very large data can be generated in the information consumption field, if the graph calculation technology can be applied to the field, the accuracy of recommendation and the satisfaction of users can be improved, the performance of a recommendation system can be optimized, the application scene of personalized recommendation is enriched, and finally the competitiveness of products in the information consumption field is improved.
Disclosure of Invention
The invention aims to solve the technical problem of providing an application method in the information consumption field based on a graph computing technology, so that modes, relations and structures can be intensively found in complex information consumption data, and a brand new view angle and method are provided for a personalized recommendation system.
In order to solve the technical problems, the technical scheme of the invention is as follows:
an application method of graph-based computing technology in the field of information consumption comprises the following steps:
(1) Constructing a dynamic user behavior diagram;
a. collecting behavior data of a user in different time periods;
b. converting the user behavior data into a dynamic user behavior diagram, wherein nodes represent users and edges represent user behaviors;
c. introducing a graph calculation technology to update the dynamic user behavior graph in real time;
(2) Graph embedding technology-based community discovery algorithm:
a. mapping nodes and edges in the dynamic user behavior graph into a low-dimensional vector space;
b. Calculating the similarity among the nodes, and dividing user communities according to the similarity;
(3) Recommendation algorithm based on cluster analysis;
a. Deep mining and analysis of the relationship between the user and the commodity: combining basic information, behavior data and social relation network of the user; taking the user and the article as nodes of the graph, constructing edges according to the user behavior data to represent the interaction relationship between the user and the article, and considering the weight of the edges to represent the preference degree of the user to the article;
b. the interest preference of the user is mined through node similarity analysis and clustering algorithm by using a graph calculation technology;
(4) Personalized recommendation service;
a. Analyzing the user relationship in the social network by using a graph computing technology, identifying the social circle, the influence size and the social interaction mode of the user, and mining the social influence of the user by analyzing the position and the role of the user in the social network;
b. Evaluating social influence of the user by analyzing the breadth and depth of information transmission of the user or the interaction quantity caused by the user;
c. And (3) establishing a preference model: analyzing personal data, interest labels and activity histories of the users, establishing an interest preference model of the users, and recommending possible friends or contents for the users by using a content recommendation technology and combining social relations and interest preferences of the users.
Further, in step 1, the updating of the dynamic user behavior graph includes updating of node attributes and updating of relationship strength of edges.
Further, mapping the nodes and edges in the dynamic user behavior graph into the low-dimensional vector space in step 2 may employ DeepWalk, node 2.2 Vec, graphSAGE or Node2Vec techniques.
Furthermore, in the step 3, the similarity between the computing nodes uses a cosine similarity or pearson correlation coefficient method, and a proper clustering algorithm is selected according to specific requirements and data characteristics, wherein the clustering algorithm comprises a K-means algorithm, a hierarchical clustering algorithm and a density-based clustering algorithm.
Further, in step 4, the key node identification technology based on GCN analyzes the centrality index of the user in the community so as to identify key nodes, i.e. those users with high influence on the community structure.
The invention also provides an application system in the information consumption field based on the graph computing technology, which comprises the following modules.
And a graph structure building module: realizing a graph structure construction module, and converting the processed user behavior data into a graph structure;
And a graph calculation updating module: updating the user behavior graph in real time by adopting a graph neural network or other graph calculation algorithms, processing newly generated user behavior data, and adjusting the graph structure to reflect the change of the user behavior;
interest preference inference module: deducing interest preference of the user through a graph algorithm by utilizing the updated user behavior graph;
the recommendation generation module: generating a personalized recommendation list according to interest preference and real-time behavior of the user;
And a user interaction optimization module: monitoring interactive feedback of a user on recommended content, including clicking, browsing and purchasing behaviors, optimizing a recommendation list according to the interactive feedback of the user, adjusting a recommendation strategy and improving recommendation quality
The invention has the advantages that:
1. In the aspect of constructing the user behavior graph, the invention uses a dynamic user behavior graph construction method based on graph calculation technology. The method utilizes the continuity of user behaviors in time to connect the user behavior data in different time periods to construct a dynamic user behavior graph. The construction mode can fully mine the time sequence relation between the user behaviors, and improves the accuracy and the instantaneity of the recommendation system.
2. The invention uses a community discovery algorithm based on graph embedding technology. The algorithm maps nodes and edges in the graph structure into a low-dimensional vector space so that the similarity between the nodes can be judged by the distance calculation between the vectors. By utilizing the algorithm, the user group with similar interests or behaviors can be identified more accurately, and a more accurate recommendation basis is provided for a recommendation system.
3. The invention uses a recommendation algorithm based on cluster analysis. The algorithm utilizes graph analysis technology to deeply mine and analyze the relationship between the user and the commodity through the characteristics of the path such as length, weight and the like.
Drawings
FIG. 1 is a schematic diagram of a GCN model structure;
FIG. 2 is a schematic diagram of EBC score calculation;
FIG. 3 is a schematic view of cosine similarity of two vectors in three-dimensional space;
FIG. 4 is a schematic diagram of a K-means clustering process;
FIG. 5 is a schematic diagram of a key node identification technique based on GCN;
FIG. 6 is a schematic diagram of a first feature processing mode of a recommendation algorithm based on deep learning;
fig. 7 is a schematic diagram of a second feature processing mode of the recommendation algorithm based on deep learning.
Detailed Description
The following describes the embodiments of the present invention further with reference to the drawings. The description of these embodiments is provided to assist understanding of the present invention, but is not intended to limit the present invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
As shown in the figure:
The graph calculation technology is used as a powerful data mining and analysis tool, can find patterns, relations and structures in complex data sets, and provides a brand new view angle and method for personalized recommendation systems. By constructing the technical schemes of user behavior graphs, mining interest preferences, analyzing social networks, associating recommendation algorithms, sequencing recommendation results and the like, the method and the device can improve recommendation accuracy and user satisfaction, optimize performance of a recommendation system, enrich application scenes of personalized recommendation and finally improve competitiveness of products in the information consumption field. The specific technical scheme is as follows:
1. Dynamic user behavior diagram construction method
And (3) data collection:
behavior data of the user over different time periods is collected, including but not limited to browsing records, collecting behavior, purchasing records, ratings, reviews, and the like.
The data can come from the direct behavior of the user on the platform, or can be behavior data obtained through analysis and prediction of the behavior pattern of the user.
Graph structure conversion:
User behavior data is converted into a graph structure in which nodes represent users and edges represent user behavior.
Different behaviors may be represented as different types of edges, for example, browsing behaviors may be represented by undirected edges, while purchasing behaviors may be represented by directed edges. The weight of an edge may represent the frequency of occurrence of the action or the degree of interest of the user.
Graph calculation technology:
graph computation techniques, such as graph roll-up networks (Graph Convolutional Networks, GCN), are introduced to update the dynamic user behavior graph in real-time.
The graph convolution neural network model is used as a deep learning model capable of effectively processing text relation information in the field of text classification, and a feature extraction flow for constructing feature representation by using convolution operation to extract feature information in the convolution neural network is continued. The convolution operation used in the convolution neural network is continuously simplified from the fourier transform operation of the signal. The method has the advantages that theoretical feasibility of obtaining effective characteristic representation by fusing relation information of the graph convolution neural network is ensured, convolution operation is generalized to the field of relation graph data, and the characteristic processing process of the graph data can have a plurality of advantages when the convolution operation is used for characteristic processing. The structure of the GCN is shown in FIG. 1.
The GCN model takes the node characteristic representation and the node relation graph as input, and generates a central node characteristic representation through a graph volume aggregation process depending on network parameters and node relation weights. The GCN model has a definite theoretical support on the performance of feature extraction of node feature signals in the relation diagram. Meanwhile, the actual operation process of the GCN model can be similar to that of the graph neural network model GNN, the GCN model effectively characterizes the relation weight among all entity nodes in the relation graph by introducing the Laplacian matrix on the basis of inheriting the advantages of the adjacent relation of the GNN effective aggregation relation graph, so that the model can distinguish adjacent nodes with different association degrees in the process of aggregating the adjacent node characteristic information by the central node, and the characteristic extraction and characteristic aggregation capacity of the model are effectively enhanced. However, compared with the operation cost of performing the aggregation operation on all the neighboring nodes with an unfixed number of each central node by using one-time GNN operation, the introduction of the operation steps of the laplace matrix by using the GCN further increases the operation burden of the process, so that the GCN model is difficult to be directly applied to the larger-scale relational graph data. On the other hand, when the GCN model acquires the aggregate characteristic information, the GCN model depends on node characteristic information, and the final classification effect of the model also depends on the Laplace matrix obtained according to the original relation diagram data to a great extent.
The update process may include an update of node attributes (e.g., user interest preferences), an update of the relationship strength of edges (e.g., frequency of behavior). The hot spot change of the user behavior can be captured by real-time updating, and the transfer of the user interests can be reflected in time.
2. Community discovery algorithm based on graph embedding technology
(1) Nodes and edges in the dynamic user behavior graph are mapped into a low-dimensional vector space.
Mapping nodes and edges in a constructed graph into a low-dimensional vector space, common techniques are:
DeepWalk: generating node sequences through random walk, and training a word2vec model by utilizing the sequences to obtain a low-dimensional representation of the nodes.
Node2Vec: similar to DeepWalk, but with the addition of context information for the node, different types of edges are used to enhance the diversity of the node representation.
GRAPHSAGE: the representation of the node is learned by sub-sampling around the node.
The invention selects the graph embedding technology Node2Vec which can be best adapted to the characteristics of the dynamic user behavior graph. Since the nodes in the graph have rich contextual information, and such information is critical to understanding user behavior and community structure, node2Vec is the best choice.
(2) And calculating the similarity among the nodes, and dividing the user communities according to the similarity.
In a recommendation system, community discovery algorithms can be used to identify user groups with similar interests or behaviors, thereby improving the accuracy and individuality of the recommendation. The following are two community discovery algorithms commonly used in recommendation systems:
Girvan-Newman algorithm: communities are found by iteratively removing edges in a network, which are suitable for discovering a modular structure in the network. Under the guillotine-newman algorithm, communities in the graph are found by iteratively removing edges of the graph according to edge-betweenness centrality (Edge Betweenness Centrality, EBC) values of the graph. The edge with the greatest centrality of edge betweenness is removed first.
Edge betweenness (edge betweenness centrality, EBC) may be defined as the number of shortest paths through an edge in a network. Each edge is scored for one EBC based on the shortest path between all nodes in the graph.
For the purposes of the graph and network, the shortest path refers to the path that has the smallest distance between any two nodes. Let us take an example to know how the EBC score is calculated. See fig. 2:
the calculation of the EBC score is an iterative process, and the following is a summary of the overall process:
(1) Determining the shortest path: first, the shortest path between all node pairs in the network needs to be calculated. This can be achieved by classical algorithms in graph theory, such as Dijkstra's algorithm or Floyd-Warshall's algorithm.
(2) The EBC score of the edge is calculated: for each edge, it is necessary to count the number of times it occurs in all shortest paths. This number is the EBC score for the edge.
(3) Iterative calculation: the above steps need to be repeated for each node in the graph to ensure that the EBC scores of all nodes are calculated.
(4) EBC scores were calculated: finally, the EBC score for each edge is added and divided by the number of nodes in the network (divided by the number of nodes if an undirected graph, divided by the number of nodes minus 1 if a unidirectional graph). This gives an average EBC score for each edge.
The higher the EBC score, the greater the importance of this edge in the network, as it appears in more shortest paths. In the Girvan-Newman algorithm, the edge with the highest EBC score is removed, which breaks the important path connecting different communities in the network, helping reveal the boundaries of the communities.
In a recommendation system, the user groups identified by the community discovery algorithm may be grouped according to the common behavioral characteristics and interest preferences of the users. In this way, the recommendation system can provide more personalized recommendation for each community, and the satisfaction degree of users and the overall effect of the system are improved.
3. Recommendation algorithm based on cluster analysis
Deep mining and analysis of the relationship between the user and the commodity is performed.
Combining basic information (such as age, sex, occupation, etc.), behavior data and social relationship network of users. Users and items (e.g., merchandise, movies, etc.) are used as nodes of the graph. Based on the user behavior data, edges are constructed to represent interactions between the user and the item, such as purchases, scores, reviews, and the like. The weighting of the edges is considered to represent the user's preference for the item.
And (3) utilizing a graph computing technology, and mining interest preference of a user through node similarity analysis, a clustering algorithm and the like. The similarity between the nodes can be calculated by using cosine similarity, pearson correlation coefficient and other methods.
Cosine similarity is a measure of correlation by measuring the cosine value between two vectors to determine the degree of similarity between the two vectors, and thus determine whether the two vectors are likely to be oriented in the same direction. If the two vectors are opposite, the similarity value is closer to-1, i.e., the similarity is lower. Conversely, if the two vectors point in approximately the same direction, their similarity value is closer to 1, meaning that they are similar. The definition is as follows: assuming two possible solutions in the D-dimensional space, the corresponding vectors are x=x1, X2,..xd, y=y1, Y2,..yd, respectively, then the cosine similarity is determined by the following equation.
Where θ represents the angle between vectors X and Y, and D is the dimension of the problem space. cos [1,1].
The cosine similarity of the two vectors in three dimensions is shown in fig. 3.
And various factors such as attribute characteristics, social relations, behavior characteristics and the like of the nodes can be considered.
And selecting a proper clustering algorithm, such as K-means, hierarchical clustering, density-based clustering and the like, according to specific requirements and data characteristics.
K-means clustering process diagram is shown in FIG. 4
The K-means clustering algorithm only needs to adjust the parameter of the number of clusters in advance. Fig. 4 is a clustering result after 4 iterations of two-dimensional data when k=2, and "x" in the figure each represents a clustering center:
Multiple algorithms may be combined to obtain more accurate clustering results.
Cluster analysis and optimization: and clustering the users by using a selected clustering algorithm, and analyzing the characteristics and interest preference of each clustered group.
According to the clustering result, the recommendation algorithm can be optimized, and the recommendation accuracy is improved
An interest preference model is built, and a solid data base is provided for personalized recommendation.
4. Personalized recommendation service
And recommending commodities liked by similar user groups to the user according to the user behavior diagram and the community discovery result.
And analyzing the user relationship in the social network by using a graph computing technology, and identifying the social circle, the influence and the social interaction mode of the user. The social impact of a user is mined by analyzing the user's location and roles in the social network, such as center nodes, bridge nodes, etc.
In a social network, user role and impact analysis typically involves several fields and attributes:
* User basic attribute:
-user ID: a number uniquely identifying a user.
-User name: a nickname or real name of the user.
Sex: gender information of the user.
-Age: age range of the user.
-Region: the geographic location where the user is located.
Occupation: professional information of the user.
* Social network relationship attribute:
-number of friends: number of friends of the user in the social network.
Number of interests: the number of other users the user is interested in.
Number of interest: the number of users who pay attention to the user.
-Social circle: social circles or communities to which users belong.
Network density: the compactness of the user's social network, such as the frequency of interactions between friends.
* Interaction behavior attribute:
number of posts: the number of posts posted by a user in a social network.
Number of comments: number of comments made by the user.
Praise number: number of user praise.
-Sharing number: the user shares the number of other posts.
-Interaction frequency: the user's activity in the social network.
* Content preference attribute:
-interest tags: interests or hobbies marked by users in the social network.
-Common topics: topics or subjects frequently discussed by users.
-Content type: a type of content that the user prefers, such as pictures, videos, text, etc.
* Influence evaluation attribute:
-centrality index: such as centrality, mid-count centrality, tight centrality, etc., the user's location and influence in the social network is assessed.
Impact score: user impact scores calculated by an algorithm.
-Social role: based on the user's behavior and attributes, user social roles are defined, such as innovators, early recipients, etc.
Information propagation depth: the depth at which information posted by a user propagates in a social network.
Information propagation breadth: the breadth of the information published by users that propagates in a social network.
* Privacy and security attributes:
-privacy settings: privacy protection level set by the user.
-A security measure: security measures taken by the user, such as two-factor authentication, etc.
The fields and the attributes can help analyze the behavior mode, the social role and the influence of the user in the social network, so that basis is provided for personalized recommendation, advertisement delivery, content popularization and the like. In practice, these attributes may vary depending on the particular social networking platform and business requirements.
Assessing the impact of a user may be by analyzing the breadth and depth of the user information propagation, or the amount of interaction that the user has caused.
Key node identification (KeyNodeIdentification): by analyzing the centrality index of the user in the community, key nodes, namely high-influence users with obvious influence on the community structure, can be identified. The invention uses a key node identification technology based on GCN, and the flow of the method is shown in figure 5.
Firstly, removing a very small amount of isolated nodes contained in some network data in the data processing process, so that an input legend used by the method is a connected graph; secondly, extracting the secondary subgraph of each node in the network, constructing a subgraph network, extracting the characteristics of the subgraph network, and constructing a characteristic vector composed of 7 characteristics for each node; then, constructing a key node identification model, inputting the key node identification model into a GCN layer for learning according to the extracted characteristics of each node, and simultaneously adding jump connection in the layer and then learning a key node prediction task through three full-connection layers in order to better utilize the attribute of the node; finally, the minimum loss is calculated according to the regression loss function MSE according to the legend data, and the output value of the model is the criticality score condition of each node in the prediction network.
And (3) establishing a preference model:
And analyzing personal data, interest labels, activity histories and the like of the user, and establishing an interest preference model of the user. Using content recommendation techniques, possible friends or content are recommended to the user in combination with the user's social relationships and interest preferences.
The personalized recommendation algorithm is realized:
Personalized recommendation algorithms, such as deep learning models, are designed and implemented to predict new friends or content that may be of interest to the user.
The recommendation algorithm information feature extraction method based on deep learning mainly comprises two methods: the first method, as shown in fig. 6, converts the user and item text information into vector form by single Hot coding (One-Hot), which facilitates the processing of the next deep network model, wherein the deep network model is a network model comprising multiple layers of neural network full-connected layers, and then outputs item features and user features to predict scores in the form of concatenation or matrix product. Although the neural network model relieves the data sparsity to a certain extent, the neural network model is limited by the deep neural network model, only ID information of users and projects is used, and other information features of the users and the projects are not extracted, so that the effect is poor.
The second method, as shown in fig. 6, is a recommendation algorithm constructed based on a convolutional neural network or other complex network model, the algorithm firstly carries out vectorization processing on text information of users and projects through single thermal coding, then carries out text feature extraction through an embedded layer and the convolutional neural network, wherein the convolutional neural network comprises an input layer, a convolutional layer, a pooling layer, a full-connection layer and an output layer, the convolutional layer is responsible for carrying out feature extraction on data, the pooling layer selects the largest feature or average feature, plays a role of dimension reduction and overfitting prevention, and the output layer further carries out processing on the data feature and outputs text feature vectors. The recommendation algorithm based on the depth network model greatly improves the capability of extracting information features, but loses some non-text information, and the information has important effect on relieving data sparsity although the semantics are not very rich.
And then, the correlation and accuracy of the recommendation are improved by utilizing the social network information of the user. And combining social relations and interest preferences of the users to realize personalized recommendation based on the social network.
Random walk strategy:
algorithm principle:
Random walk is a probability-based graph traversal method that allows random jumps from any one node to other nodes in the graph with a certain probability. In a recommendation system, random walks may simulate random behavior of a user while browsing items, thereby discovering new items that may be of interest to the user.
Random walk on graphic structure:
And (3) applying a random walk algorithm on the constructed user behavior diagram, and starting from the user nodes, randomly selecting the next accessed node according to a certain probability distribution. This process may be performed multiple times to explore different paths and connections in the graph.
Discovering interest paths of users:
Through multiple random walks, potential links between user behaviors can be found, and interest paths of users can be mined. The interest path may be a sequence of commodity nodes that are linked to each other by a user's browsing or purchasing behavior.
Personalized recommendation:
Using the mined interest path, the user can be recommended items that are similar to their past patterns of behavior but have not been accessed. For example, if a user frequently browses items of a certain category, a random walk algorithm may find a link between this category and other categories, thereby recommending relevant items that the user has not browsed.
Application in real-time recommendation system:
In a real-time recommendation system, a random walk algorithm may dynamically adjust recommended items to reflect changes in user behavior. The random walk algorithm may be re-run each time the user performs a new action to find the latest path of interest.
The invention also provides an application system in the information consumption field based on the graph computing technology, which comprises the following modules.
And a data processing module:
And the data processing module is used for receiving and processing the user behavior data.
The data processing comprises the steps of data cleaning, de-duplication, normalization and the like.
And a graph structure building module:
and a graph structure construction module is realized, and the processed user behavior data is converted into a graph structure.
The construction of the graph structure requires defining the types of nodes and edges and the weight calculation method of the edges.
And a graph calculation updating module:
And the graph calculation updating module is used for updating the user behavior graph in real time by adopting a graph neural network or other graph calculation algorithms.
The update module needs to be able to process the newly generated user behavior data and adjust the graph structure to reflect the changes in user behavior.
Interest preference inference module:
And deducing interest preference of the user through a graph algorithm (such as community discovery and path analysis) by using the updated user behavior graph.
The interest preference inference module needs to be able to handle complex relationships in the graph, identifying the behavior patterns and potential needs of the user.
The recommendation generation module:
and generating a personalized recommendation list according to the interest preference and the real-time behavior of the user.
The recommendation generation module may incorporate a variety of recommendation algorithms, such as content-based recommendations, social-based recommendations, model-based recommendations, and the like.
And a user interaction optimization module:
and monitoring interactive feedback of the user on the recommended content, such as clicking, browsing, purchasing and other actions.
And optimizing the recommendation list according to the user interaction feedback, adjusting the recommendation strategy and improving the recommendation quality.
The following steps are specific:
1. dynamic user behavior graph construction:
-collecting user behavior data, including browsing, clicking, purchasing, etc., events, and recording a time stamp.
-Dividing the user behavior data into different time windows according to the time stamps.
Within each time window, the nodes and edges of the graph are constructed from user behavior, e.g. user nodes, commodity nodes, the browsing or purchasing behavior of the commodity by the user forming edges.
-Learning and updating the dynamic user behavior graph using graph computation techniques, such as Graph Neural Network (GNN), to capture changes in user behavior.
2. Graph embedding technology-based community discovery algorithm:
-graph embedding nodes in the user behavior graph, mapping the nodes to a low-dimensional vector space.
-Calculating the similarity between the node vectors using a similarity measure, such as cosine similarity.
-Applying a community discovery algorithm, such as the Girvan-Newman algorithm or the Louvain method, to identify strong connections between nodes, forming communities.
-Analysing community structure, identifying groups of users, which groups may have similar interests or behaviour patterns.
3. Recommendation algorithm based on cluster analysis:
-path analysis and weight calculation are performed on the user behavior graph, and characteristics such as path length, weight and the like are extracted.
Clustering the users according to their behavioral characteristics using a clustering algorithm (e.g., K-means, DBSCAN).
-Performing a feature analysis on each cluster, determining the user interests and purchasing habits of the cluster center.
-Recommending for each user a good similar to the cluster center in which it is located, based on the clustering result.
4. A random walk algorithm is introduced:
-traversing the user behavioural map using a random walk algorithm to explore items that the user may not have interacted with directly.
During the random walk, the commodity is sampled according to path length and weight to find potential recommended commodity.
-Combining the random walk-derived merchandise information with the user clustering results to provide a more personalized recommendation list.
5. Model evaluation and optimization:
Training and testing the above method using the real world dataset.
Evaluating the performance of the recommendation system, such as the indexes of accuracy, recall, coverage, etc.
-Adjusting and optimizing the model based on user feedback and performance evaluation results.
Through the implementation steps, the technology can construct a dynamic user behavior diagram, discover a user community and provide personalized commodity recommendation through cluster analysis and a random walk algorithm.
The application method and the system have the remarkable advantages and beneficial effects as follows.
Firstly, the application method of the invention can more intuitively and clearly present the internal relation and rule of the information consumption data by constructing the graph structure representation of the information consumption data. Traditional data processing methods tend to only focus on the data itself, but neglect the relationships between the data, resulting in a lack of depth and comprehensiveness of the processing results. The graph structure representation can clearly represent the relationship between data, and provides a solid foundation for subsequent graph calculation.
Secondly, the graph structure can be cleaned and the characteristics of the graph structure can be extracted through a preprocessing step, so that the accuracy and the efficiency of the subsequent graph calculation can be greatly improved. In information consumption data, there is often a lot of noise and irrelevant information, which can interfere with the results of the graph computation. Through data cleaning, noise and irrelevant information can be removed, and the accuracy of calculation is improved. Meanwhile, key features and relations in the data can be extracted through operations such as node feature extraction and edge weight calculation, more valuable information is provided for subsequent graph calculation, and therefore calculation efficiency is improved.
Thirdly, the application method of the invention can mine potential rules and modes in the information consumption data by introducing a graph calculation algorithm. The graph computation algorithm can perform deep analysis and mining on the graph structure to find potential rules and patterns in the data. The rules and modes have important guiding significance for information consumption, can help users to better understand the internal mechanism of information consumption, and improve the efficiency and accuracy of information consumption.
Fourth, the system of the present invention has better expandability and flexibility. With the continuous growth and change of information consumption data, the system can be customized and optimized according to actual requirements. For example, new data processing modules, graph calculation algorithm modules, etc. can be added to accommodate different data processing requirements.
The embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, and yet fall within the scope of the invention.
Claims (6)
1. An application method of a graph-based computing technology in the field of information consumption is characterized by comprising the following steps: the method comprises the following steps:
(1) Constructing a dynamic user behavior diagram;
a. collecting behavior data of a user in different time periods;
b. converting the user behavior data into a dynamic user behavior diagram, wherein nodes represent users and edges represent user behaviors;
c. introducing a graph calculation technology to update the dynamic user behavior graph in real time;
(2) Graph embedding technology-based community discovery algorithm:
a. mapping nodes and edges in the dynamic user behavior graph into a low-dimensional vector space;
b. Calculating the similarity among the nodes, and dividing user communities according to the similarity;
(3) Recommendation algorithm based on cluster analysis;
a. Deep mining and analysis of the relationship between the user and the commodity: combining basic information, behavior data and social relation network of the user; taking the user and the article as nodes of the graph, constructing edges according to the user behavior data to represent the interaction relationship between the user and the article, and considering the weight of the edges to represent the preference degree of the user to the article;
b. the interest preference of the user is mined through node similarity analysis and clustering algorithm by using a graph calculation technology;
(4) Personalized recommendation service;
a. Analyzing the user relationship in the social network by using a graph computing technology, identifying the social circle, the influence size and the social interaction mode of the user, and mining the social influence of the user by analyzing the position and the role of the user in the social network;
b. Evaluating social influence of the user by analyzing the breadth and depth of information transmission of the user or the interaction quantity caused by the user;
c. And (3) establishing a preference model: analyzing personal data, interest labels and activity histories of the users, establishing an interest preference model of the users, and recommending possible friends or contents for the users by using a content recommendation technology and combining social relations and interest preferences of the users.
2. The application method of graph-based computing technology in the field of information consumption according to claim 1, wherein: the updating of the dynamic user behavior graph in the step 1 comprises updating of node attributes and updating of the relationship strength of edges.
3. The application method of graph-based computing technology in the field of information consumption according to claim 1, wherein: mapping the nodes and edges in the dynamic user behavior graph into the low-dimensional vector space in step 2 may employ DeepWalk, node, 2Vec, graphSAGE or Node2Vec techniques.
4. The application method of graph-based computing technology in the field of information consumption according to claim 1, wherein: and 3, calculating the similarity between the nodes by using a cosine similarity or pearson correlation coefficient method, and selecting a proper clustering algorithm according to specific requirements and data characteristics, wherein the clustering algorithm comprises a K-means algorithm, a hierarchical clustering algorithm and a density-based clustering algorithm.
5. The application method of graph-based computing technology in the field of information consumption according to claim 1, wherein: in step 4, the key node identification technology based on the GCN analyzes the centrality index of the user in the community so as to identify key nodes, namely, high-influence users with obvious influence on the community structure.
6. An application system in the field of information consumption based on graph computing technology is characterized by comprising the following modules;
And a graph structure building module: realizing a graph structure construction module, and converting the processed user behavior data into a graph structure;
And a graph calculation updating module: updating the user behavior graph in real time by adopting a graph neural network or other graph calculation algorithms, processing newly generated user behavior data, and adjusting the graph structure to reflect the change of the user behavior;
interest preference inference module: deducing interest preference of the user through a graph algorithm by utilizing the updated user behavior graph;
the recommendation generation module: generating a personalized recommendation list according to interest preference and real-time behavior of the user;
And a user interaction optimization module: and monitoring interactive feedback of the user on the recommended content, including clicking, browsing and purchasing behaviors, optimizing a recommendation list according to the interactive feedback of the user, adjusting a recommendation strategy and improving recommendation quality.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410110803.6A CN118071400A (en) | 2024-01-26 | 2024-01-26 | Application method and system based on graph computing technology in information consumption field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410110803.6A CN118071400A (en) | 2024-01-26 | 2024-01-26 | Application method and system based on graph computing technology in information consumption field |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118071400A true CN118071400A (en) | 2024-05-24 |
Family
ID=91099871
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410110803.6A Pending CN118071400A (en) | 2024-01-26 | 2024-01-26 | Application method and system based on graph computing technology in information consumption field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118071400A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118350904A (en) * | 2024-06-12 | 2024-07-16 | 青州市坦博尔服饰股份有限公司 | User clothing order behavior analysis method and system based on big data |
-
2024
- 2024-01-26 CN CN202410110803.6A patent/CN118071400A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118350904A (en) * | 2024-06-12 | 2024-07-16 | 青州市坦博尔服饰股份有限公司 | User clothing order behavior analysis method and system based on big data |
CN118350904B (en) * | 2024-06-12 | 2024-09-17 | 青州市坦博尔服饰股份有限公司 | User clothing order behavior analysis method and system based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gasparetti et al. | Community detection in social recommender systems: a survey | |
Yang et al. | Friend or frenemy? Predicting signed ties in social networks | |
Wang et al. | Modeling customer preferences using multidimensional network analysis in engineering design | |
Pan et al. | Clustering of designers based on building information modeling event logs | |
Yu et al. | Graph neural network based model for multi-behavior session-based recommendation | |
Huang et al. | Information fusion oriented heterogeneous social network for friend recommendation via community detection | |
CN110781940A (en) | Fuzzy mathematics-based community discovery information processing method and system | |
WO2023284516A1 (en) | Information recommendation method and apparatus based on knowledge graph, and device, medium, and product | |
CN115408618B (en) | Point-of-interest recommendation method based on social relation fusion position dynamic popularity and geographic features | |
Meng et al. | POI recommendation for occasional groups Based on hybrid graph neural networks | |
Wang et al. | Link prediction in heterogeneous collaboration networks | |
CN118071400A (en) | Application method and system based on graph computing technology in information consumption field | |
Kumar et al. | Graph Convolutional Neural Networks for Link Prediction in Social Networks | |
Amzad et al. | Tourism recommendation system: a systematic review | |
Zhang et al. | Multi-view dynamic heterogeneous information network embedding | |
CN113256024B (en) | User behavior prediction method fusing group behaviors | |
Agarwal et al. | Binarized spiking neural networks optimized with Nomadic People Optimization-based sentiment analysis for social product recommendation | |
Qin et al. | Recommender resources based on acquiring user's requirement and exploring user's preference with Word2Vec model in web service | |
Alzubaidi et al. | LPCNN: convolutional neural network for link prediction based on network structured features | |
Xu et al. | Similarmf: a social recommender system using an embedding method | |
Chen et al. | KGCF: Social relationship-aware graph collaborative filtering for recommendation | |
Fan | E-Commerce Data Mining Analysis based on User Preferences and Association Rules | |
CN118210976B (en) | Link prediction method integrating attention mechanism and graph contrast learning | |
D’Aniello et al. | Link prediction in signed social networks using fuzzy signature | |
Wang et al. | [Retracted] Research on the Spectral Domain Graph Convolution Collaborative Filtering Algorithm Based on Reinforcement Learning and Chebyshev |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |