[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN118071400A - Application method and system based on graph computing technology in information consumption field - Google Patents

Application method and system based on graph computing technology in information consumption field Download PDF

Info

Publication number
CN118071400A
CN118071400A CN202410110803.6A CN202410110803A CN118071400A CN 118071400 A CN118071400 A CN 118071400A CN 202410110803 A CN202410110803 A CN 202410110803A CN 118071400 A CN118071400 A CN 118071400A
Authority
CN
China
Prior art keywords
user
graph
recommendation
algorithm
technology
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410110803.6A
Other languages
Chinese (zh)
Inventor
魏永强
杨兴海
李建州
杨兴荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shijihengtong Technology Co ltd
Original Assignee
Shijihengtong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shijihengtong Technology Co ltd filed Critical Shijihengtong Technology Co ltd
Priority to CN202410110803.6A priority Critical patent/CN118071400A/en
Publication of CN118071400A publication Critical patent/CN118071400A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Biomedical Technology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an application method and a system in the field of information consumption based on graph computing technology, wherein the method comprises the following steps: 1) dynamic user behavior graph construction, 2) community discovery algorithm based on graph embedding technology, 3) recommendation algorithm based on cluster analysis, and 4) personalized recommendation service; the method for constructing the dynamic user behavior graph based on the graph calculation technology can fully mine the time sequence relation between the user behaviors, and improves the accuracy and the instantaneity of a recommendation system; the community finding algorithm based on the graph embedding technology is used, and the user group with similar interests or behaviors can be more accurately identified by using the community finding algorithm, so that a more accurate recommendation basis is provided for a recommendation system. The invention uses a recommendation algorithm based on cluster analysis, and the algorithm utilizes a graph analysis technology to deeply mine and analyze the relationship between the user and the commodity through the characteristics of the path length, the weight and the like.

Description

Application method and system based on graph computing technology in information consumption field
Technical Field
The present invention relates to information technology, computer software and application fields, and in particular, to a method and a system for applying a graph-based computing technology to information consumption fields.
Background
With the vigorous development of big data and the Internet of things, the richness of the data is increased, the relevance among the data is increased, the traditional analysis of small data volume, single dimension and static data does not meet the requirement of time development, the rapid increase of the data volume and the effective analysis and processing of complex relevance among the data become pain points of the database industry.
At present, a graph database and a graph computing system (also called a graph computing engine) are core contents in the technical field of graph computing, the graph database is mainly responsible for operations such as adding, deleting, checking, modifying and the like of graph data, and the graph computing engine is mainly responsible for performing deep analysis processing on the graph data. The graph calculation technology can effectively process complex association relations, so that the graph calculation technology has wide application prospects in the field of information consumption. For example, a knowledge graph is a typical application, and can help enterprises organize and use data, so as to realize real-time data analysis, hidden relation discovery and contextualization accurate decision.
And very large data can be generated in the information consumption field, if the graph calculation technology can be applied to the field, the accuracy of recommendation and the satisfaction of users can be improved, the performance of a recommendation system can be optimized, the application scene of personalized recommendation is enriched, and finally the competitiveness of products in the information consumption field is improved.
Disclosure of Invention
The invention aims to solve the technical problem of providing an application method in the information consumption field based on a graph computing technology, so that modes, relations and structures can be intensively found in complex information consumption data, and a brand new view angle and method are provided for a personalized recommendation system.
In order to solve the technical problems, the technical scheme of the invention is as follows:
an application method of graph-based computing technology in the field of information consumption comprises the following steps:
(1) Constructing a dynamic user behavior diagram;
a. collecting behavior data of a user in different time periods;
b. converting the user behavior data into a dynamic user behavior diagram, wherein nodes represent users and edges represent user behaviors;
c. introducing a graph calculation technology to update the dynamic user behavior graph in real time;
(2) Graph embedding technology-based community discovery algorithm:
a. mapping nodes and edges in the dynamic user behavior graph into a low-dimensional vector space;
b. Calculating the similarity among the nodes, and dividing user communities according to the similarity;
(3) Recommendation algorithm based on cluster analysis;
a. Deep mining and analysis of the relationship between the user and the commodity: combining basic information, behavior data and social relation network of the user; taking the user and the article as nodes of the graph, constructing edges according to the user behavior data to represent the interaction relationship between the user and the article, and considering the weight of the edges to represent the preference degree of the user to the article;
b. the interest preference of the user is mined through node similarity analysis and clustering algorithm by using a graph calculation technology;
(4) Personalized recommendation service;
a. Analyzing the user relationship in the social network by using a graph computing technology, identifying the social circle, the influence size and the social interaction mode of the user, and mining the social influence of the user by analyzing the position and the role of the user in the social network;
b. Evaluating social influence of the user by analyzing the breadth and depth of information transmission of the user or the interaction quantity caused by the user;
c. And (3) establishing a preference model: analyzing personal data, interest labels and activity histories of the users, establishing an interest preference model of the users, and recommending possible friends or contents for the users by using a content recommendation technology and combining social relations and interest preferences of the users.
Further, in step 1, the updating of the dynamic user behavior graph includes updating of node attributes and updating of relationship strength of edges.
Further, mapping the nodes and edges in the dynamic user behavior graph into the low-dimensional vector space in step 2 may employ DeepWalk, node 2.2 Vec, graphSAGE or Node2Vec techniques.
Furthermore, in the step 3, the similarity between the computing nodes uses a cosine similarity or pearson correlation coefficient method, and a proper clustering algorithm is selected according to specific requirements and data characteristics, wherein the clustering algorithm comprises a K-means algorithm, a hierarchical clustering algorithm and a density-based clustering algorithm.
Further, in step 4, the key node identification technology based on GCN analyzes the centrality index of the user in the community so as to identify key nodes, i.e. those users with high influence on the community structure.
The invention also provides an application system in the information consumption field based on the graph computing technology, which comprises the following modules.
And a graph structure building module: realizing a graph structure construction module, and converting the processed user behavior data into a graph structure;
And a graph calculation updating module: updating the user behavior graph in real time by adopting a graph neural network or other graph calculation algorithms, processing newly generated user behavior data, and adjusting the graph structure to reflect the change of the user behavior;
interest preference inference module: deducing interest preference of the user through a graph algorithm by utilizing the updated user behavior graph;
the recommendation generation module: generating a personalized recommendation list according to interest preference and real-time behavior of the user;
And a user interaction optimization module: monitoring interactive feedback of a user on recommended content, including clicking, browsing and purchasing behaviors, optimizing a recommendation list according to the interactive feedback of the user, adjusting a recommendation strategy and improving recommendation quality
The invention has the advantages that:
1. In the aspect of constructing the user behavior graph, the invention uses a dynamic user behavior graph construction method based on graph calculation technology. The method utilizes the continuity of user behaviors in time to connect the user behavior data in different time periods to construct a dynamic user behavior graph. The construction mode can fully mine the time sequence relation between the user behaviors, and improves the accuracy and the instantaneity of the recommendation system.
2. The invention uses a community discovery algorithm based on graph embedding technology. The algorithm maps nodes and edges in the graph structure into a low-dimensional vector space so that the similarity between the nodes can be judged by the distance calculation between the vectors. By utilizing the algorithm, the user group with similar interests or behaviors can be identified more accurately, and a more accurate recommendation basis is provided for a recommendation system.
3. The invention uses a recommendation algorithm based on cluster analysis. The algorithm utilizes graph analysis technology to deeply mine and analyze the relationship between the user and the commodity through the characteristics of the path such as length, weight and the like.
Drawings
FIG. 1 is a schematic diagram of a GCN model structure;
FIG. 2 is a schematic diagram of EBC score calculation;
FIG. 3 is a schematic view of cosine similarity of two vectors in three-dimensional space;
FIG. 4 is a schematic diagram of a K-means clustering process;
FIG. 5 is a schematic diagram of a key node identification technique based on GCN;
FIG. 6 is a schematic diagram of a first feature processing mode of a recommendation algorithm based on deep learning;
fig. 7 is a schematic diagram of a second feature processing mode of the recommendation algorithm based on deep learning.
Detailed Description
The following describes the embodiments of the present invention further with reference to the drawings. The description of these embodiments is provided to assist understanding of the present invention, but is not intended to limit the present invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
As shown in the figure:
The graph calculation technology is used as a powerful data mining and analysis tool, can find patterns, relations and structures in complex data sets, and provides a brand new view angle and method for personalized recommendation systems. By constructing the technical schemes of user behavior graphs, mining interest preferences, analyzing social networks, associating recommendation algorithms, sequencing recommendation results and the like, the method and the device can improve recommendation accuracy and user satisfaction, optimize performance of a recommendation system, enrich application scenes of personalized recommendation and finally improve competitiveness of products in the information consumption field. The specific technical scheme is as follows:
1. Dynamic user behavior diagram construction method
And (3) data collection:
behavior data of the user over different time periods is collected, including but not limited to browsing records, collecting behavior, purchasing records, ratings, reviews, and the like.
The data can come from the direct behavior of the user on the platform, or can be behavior data obtained through analysis and prediction of the behavior pattern of the user.
Graph structure conversion:
User behavior data is converted into a graph structure in which nodes represent users and edges represent user behavior.
Different behaviors may be represented as different types of edges, for example, browsing behaviors may be represented by undirected edges, while purchasing behaviors may be represented by directed edges. The weight of an edge may represent the frequency of occurrence of the action or the degree of interest of the user.
Graph calculation technology:
graph computation techniques, such as graph roll-up networks (Graph Convolutional Networks, GCN), are introduced to update the dynamic user behavior graph in real-time.
The graph convolution neural network model is used as a deep learning model capable of effectively processing text relation information in the field of text classification, and a feature extraction flow for constructing feature representation by using convolution operation to extract feature information in the convolution neural network is continued. The convolution operation used in the convolution neural network is continuously simplified from the fourier transform operation of the signal. The method has the advantages that theoretical feasibility of obtaining effective characteristic representation by fusing relation information of the graph convolution neural network is ensured, convolution operation is generalized to the field of relation graph data, and the characteristic processing process of the graph data can have a plurality of advantages when the convolution operation is used for characteristic processing. The structure of the GCN is shown in FIG. 1.
The GCN model takes the node characteristic representation and the node relation graph as input, and generates a central node characteristic representation through a graph volume aggregation process depending on network parameters and node relation weights. The GCN model has a definite theoretical support on the performance of feature extraction of node feature signals in the relation diagram. Meanwhile, the actual operation process of the GCN model can be similar to that of the graph neural network model GNN, the GCN model effectively characterizes the relation weight among all entity nodes in the relation graph by introducing the Laplacian matrix on the basis of inheriting the advantages of the adjacent relation of the GNN effective aggregation relation graph, so that the model can distinguish adjacent nodes with different association degrees in the process of aggregating the adjacent node characteristic information by the central node, and the characteristic extraction and characteristic aggregation capacity of the model are effectively enhanced. However, compared with the operation cost of performing the aggregation operation on all the neighboring nodes with an unfixed number of each central node by using one-time GNN operation, the introduction of the operation steps of the laplace matrix by using the GCN further increases the operation burden of the process, so that the GCN model is difficult to be directly applied to the larger-scale relational graph data. On the other hand, when the GCN model acquires the aggregate characteristic information, the GCN model depends on node characteristic information, and the final classification effect of the model also depends on the Laplace matrix obtained according to the original relation diagram data to a great extent.
The update process may include an update of node attributes (e.g., user interest preferences), an update of the relationship strength of edges (e.g., frequency of behavior). The hot spot change of the user behavior can be captured by real-time updating, and the transfer of the user interests can be reflected in time.
2. Community discovery algorithm based on graph embedding technology
(1) Nodes and edges in the dynamic user behavior graph are mapped into a low-dimensional vector space.
Mapping nodes and edges in a constructed graph into a low-dimensional vector space, common techniques are:
DeepWalk: generating node sequences through random walk, and training a word2vec model by utilizing the sequences to obtain a low-dimensional representation of the nodes.
Node2Vec: similar to DeepWalk, but with the addition of context information for the node, different types of edges are used to enhance the diversity of the node representation.
GRAPHSAGE: the representation of the node is learned by sub-sampling around the node.
The invention selects the graph embedding technology Node2Vec which can be best adapted to the characteristics of the dynamic user behavior graph. Since the nodes in the graph have rich contextual information, and such information is critical to understanding user behavior and community structure, node2Vec is the best choice.
(2) And calculating the similarity among the nodes, and dividing the user communities according to the similarity.
In a recommendation system, community discovery algorithms can be used to identify user groups with similar interests or behaviors, thereby improving the accuracy and individuality of the recommendation. The following are two community discovery algorithms commonly used in recommendation systems:
Girvan-Newman algorithm: communities are found by iteratively removing edges in a network, which are suitable for discovering a modular structure in the network. Under the guillotine-newman algorithm, communities in the graph are found by iteratively removing edges of the graph according to edge-betweenness centrality (Edge Betweenness Centrality, EBC) values of the graph. The edge with the greatest centrality of edge betweenness is removed first.
Edge betweenness (edge betweenness centrality, EBC) may be defined as the number of shortest paths through an edge in a network. Each edge is scored for one EBC based on the shortest path between all nodes in the graph.
For the purposes of the graph and network, the shortest path refers to the path that has the smallest distance between any two nodes. Let us take an example to know how the EBC score is calculated. See fig. 2:
the calculation of the EBC score is an iterative process, and the following is a summary of the overall process:
(1) Determining the shortest path: first, the shortest path between all node pairs in the network needs to be calculated. This can be achieved by classical algorithms in graph theory, such as Dijkstra's algorithm or Floyd-Warshall's algorithm.
(2) The EBC score of the edge is calculated: for each edge, it is necessary to count the number of times it occurs in all shortest paths. This number is the EBC score for the edge.
(3) Iterative calculation: the above steps need to be repeated for each node in the graph to ensure that the EBC scores of all nodes are calculated.
(4) EBC scores were calculated: finally, the EBC score for each edge is added and divided by the number of nodes in the network (divided by the number of nodes if an undirected graph, divided by the number of nodes minus 1 if a unidirectional graph). This gives an average EBC score for each edge.
The higher the EBC score, the greater the importance of this edge in the network, as it appears in more shortest paths. In the Girvan-Newman algorithm, the edge with the highest EBC score is removed, which breaks the important path connecting different communities in the network, helping reveal the boundaries of the communities.
In a recommendation system, the user groups identified by the community discovery algorithm may be grouped according to the common behavioral characteristics and interest preferences of the users. In this way, the recommendation system can provide more personalized recommendation for each community, and the satisfaction degree of users and the overall effect of the system are improved.
3. Recommendation algorithm based on cluster analysis
Deep mining and analysis of the relationship between the user and the commodity is performed.
Combining basic information (such as age, sex, occupation, etc.), behavior data and social relationship network of users. Users and items (e.g., merchandise, movies, etc.) are used as nodes of the graph. Based on the user behavior data, edges are constructed to represent interactions between the user and the item, such as purchases, scores, reviews, and the like. The weighting of the edges is considered to represent the user's preference for the item.
And (3) utilizing a graph computing technology, and mining interest preference of a user through node similarity analysis, a clustering algorithm and the like. The similarity between the nodes can be calculated by using cosine similarity, pearson correlation coefficient and other methods.
Cosine similarity is a measure of correlation by measuring the cosine value between two vectors to determine the degree of similarity between the two vectors, and thus determine whether the two vectors are likely to be oriented in the same direction. If the two vectors are opposite, the similarity value is closer to-1, i.e., the similarity is lower. Conversely, if the two vectors point in approximately the same direction, their similarity value is closer to 1, meaning that they are similar. The definition is as follows: assuming two possible solutions in the D-dimensional space, the corresponding vectors are x=x1, X2,..xd, y=y1, Y2,..yd, respectively, then the cosine similarity is determined by the following equation.
Where θ represents the angle between vectors X and Y, and D is the dimension of the problem space. cos [1,1].
The cosine similarity of the two vectors in three dimensions is shown in fig. 3.
And various factors such as attribute characteristics, social relations, behavior characteristics and the like of the nodes can be considered.
And selecting a proper clustering algorithm, such as K-means, hierarchical clustering, density-based clustering and the like, according to specific requirements and data characteristics.
K-means clustering process diagram is shown in FIG. 4
The K-means clustering algorithm only needs to adjust the parameter of the number of clusters in advance. Fig. 4 is a clustering result after 4 iterations of two-dimensional data when k=2, and "x" in the figure each represents a clustering center:
Multiple algorithms may be combined to obtain more accurate clustering results.
Cluster analysis and optimization: and clustering the users by using a selected clustering algorithm, and analyzing the characteristics and interest preference of each clustered group.
According to the clustering result, the recommendation algorithm can be optimized, and the recommendation accuracy is improved
An interest preference model is built, and a solid data base is provided for personalized recommendation.
4. Personalized recommendation service
And recommending commodities liked by similar user groups to the user according to the user behavior diagram and the community discovery result.
And analyzing the user relationship in the social network by using a graph computing technology, and identifying the social circle, the influence and the social interaction mode of the user. The social impact of a user is mined by analyzing the user's location and roles in the social network, such as center nodes, bridge nodes, etc.
In a social network, user role and impact analysis typically involves several fields and attributes:
* User basic attribute:
-user ID: a number uniquely identifying a user.
-User name: a nickname or real name of the user.
Sex: gender information of the user.
-Age: age range of the user.
-Region: the geographic location where the user is located.
Occupation: professional information of the user.
* Social network relationship attribute:
-number of friends: number of friends of the user in the social network.
Number of interests: the number of other users the user is interested in.
Number of interest: the number of users who pay attention to the user.
-Social circle: social circles or communities to which users belong.
Network density: the compactness of the user's social network, such as the frequency of interactions between friends.
* Interaction behavior attribute:
number of posts: the number of posts posted by a user in a social network.
Number of comments: number of comments made by the user.
Praise number: number of user praise.
-Sharing number: the user shares the number of other posts.
-Interaction frequency: the user's activity in the social network.
* Content preference attribute:
-interest tags: interests or hobbies marked by users in the social network.
-Common topics: topics or subjects frequently discussed by users.
-Content type: a type of content that the user prefers, such as pictures, videos, text, etc.
* Influence evaluation attribute:
-centrality index: such as centrality, mid-count centrality, tight centrality, etc., the user's location and influence in the social network is assessed.
Impact score: user impact scores calculated by an algorithm.
-Social role: based on the user's behavior and attributes, user social roles are defined, such as innovators, early recipients, etc.
Information propagation depth: the depth at which information posted by a user propagates in a social network.
Information propagation breadth: the breadth of the information published by users that propagates in a social network.
* Privacy and security attributes:
-privacy settings: privacy protection level set by the user.
-A security measure: security measures taken by the user, such as two-factor authentication, etc.
The fields and the attributes can help analyze the behavior mode, the social role and the influence of the user in the social network, so that basis is provided for personalized recommendation, advertisement delivery, content popularization and the like. In practice, these attributes may vary depending on the particular social networking platform and business requirements.
Assessing the impact of a user may be by analyzing the breadth and depth of the user information propagation, or the amount of interaction that the user has caused.
Key node identification (KeyNodeIdentification): by analyzing the centrality index of the user in the community, key nodes, namely high-influence users with obvious influence on the community structure, can be identified. The invention uses a key node identification technology based on GCN, and the flow of the method is shown in figure 5.
Firstly, removing a very small amount of isolated nodes contained in some network data in the data processing process, so that an input legend used by the method is a connected graph; secondly, extracting the secondary subgraph of each node in the network, constructing a subgraph network, extracting the characteristics of the subgraph network, and constructing a characteristic vector composed of 7 characteristics for each node; then, constructing a key node identification model, inputting the key node identification model into a GCN layer for learning according to the extracted characteristics of each node, and simultaneously adding jump connection in the layer and then learning a key node prediction task through three full-connection layers in order to better utilize the attribute of the node; finally, the minimum loss is calculated according to the regression loss function MSE according to the legend data, and the output value of the model is the criticality score condition of each node in the prediction network.
And (3) establishing a preference model:
And analyzing personal data, interest labels, activity histories and the like of the user, and establishing an interest preference model of the user. Using content recommendation techniques, possible friends or content are recommended to the user in combination with the user's social relationships and interest preferences.
The personalized recommendation algorithm is realized:
Personalized recommendation algorithms, such as deep learning models, are designed and implemented to predict new friends or content that may be of interest to the user.
The recommendation algorithm information feature extraction method based on deep learning mainly comprises two methods: the first method, as shown in fig. 6, converts the user and item text information into vector form by single Hot coding (One-Hot), which facilitates the processing of the next deep network model, wherein the deep network model is a network model comprising multiple layers of neural network full-connected layers, and then outputs item features and user features to predict scores in the form of concatenation or matrix product. Although the neural network model relieves the data sparsity to a certain extent, the neural network model is limited by the deep neural network model, only ID information of users and projects is used, and other information features of the users and the projects are not extracted, so that the effect is poor.
The second method, as shown in fig. 6, is a recommendation algorithm constructed based on a convolutional neural network or other complex network model, the algorithm firstly carries out vectorization processing on text information of users and projects through single thermal coding, then carries out text feature extraction through an embedded layer and the convolutional neural network, wherein the convolutional neural network comprises an input layer, a convolutional layer, a pooling layer, a full-connection layer and an output layer, the convolutional layer is responsible for carrying out feature extraction on data, the pooling layer selects the largest feature or average feature, plays a role of dimension reduction and overfitting prevention, and the output layer further carries out processing on the data feature and outputs text feature vectors. The recommendation algorithm based on the depth network model greatly improves the capability of extracting information features, but loses some non-text information, and the information has important effect on relieving data sparsity although the semantics are not very rich.
And then, the correlation and accuracy of the recommendation are improved by utilizing the social network information of the user. And combining social relations and interest preferences of the users to realize personalized recommendation based on the social network.
Random walk strategy:
algorithm principle:
Random walk is a probability-based graph traversal method that allows random jumps from any one node to other nodes in the graph with a certain probability. In a recommendation system, random walks may simulate random behavior of a user while browsing items, thereby discovering new items that may be of interest to the user.
Random walk on graphic structure:
And (3) applying a random walk algorithm on the constructed user behavior diagram, and starting from the user nodes, randomly selecting the next accessed node according to a certain probability distribution. This process may be performed multiple times to explore different paths and connections in the graph.
Discovering interest paths of users:
Through multiple random walks, potential links between user behaviors can be found, and interest paths of users can be mined. The interest path may be a sequence of commodity nodes that are linked to each other by a user's browsing or purchasing behavior.
Personalized recommendation:
Using the mined interest path, the user can be recommended items that are similar to their past patterns of behavior but have not been accessed. For example, if a user frequently browses items of a certain category, a random walk algorithm may find a link between this category and other categories, thereby recommending relevant items that the user has not browsed.
Application in real-time recommendation system:
In a real-time recommendation system, a random walk algorithm may dynamically adjust recommended items to reflect changes in user behavior. The random walk algorithm may be re-run each time the user performs a new action to find the latest path of interest.
The invention also provides an application system in the information consumption field based on the graph computing technology, which comprises the following modules.
And a data processing module:
And the data processing module is used for receiving and processing the user behavior data.
The data processing comprises the steps of data cleaning, de-duplication, normalization and the like.
And a graph structure building module:
and a graph structure construction module is realized, and the processed user behavior data is converted into a graph structure.
The construction of the graph structure requires defining the types of nodes and edges and the weight calculation method of the edges.
And a graph calculation updating module:
And the graph calculation updating module is used for updating the user behavior graph in real time by adopting a graph neural network or other graph calculation algorithms.
The update module needs to be able to process the newly generated user behavior data and adjust the graph structure to reflect the changes in user behavior.
Interest preference inference module:
And deducing interest preference of the user through a graph algorithm (such as community discovery and path analysis) by using the updated user behavior graph.
The interest preference inference module needs to be able to handle complex relationships in the graph, identifying the behavior patterns and potential needs of the user.
The recommendation generation module:
and generating a personalized recommendation list according to the interest preference and the real-time behavior of the user.
The recommendation generation module may incorporate a variety of recommendation algorithms, such as content-based recommendations, social-based recommendations, model-based recommendations, and the like.
And a user interaction optimization module:
and monitoring interactive feedback of the user on the recommended content, such as clicking, browsing, purchasing and other actions.
And optimizing the recommendation list according to the user interaction feedback, adjusting the recommendation strategy and improving the recommendation quality.
The following steps are specific:
1. dynamic user behavior graph construction:
-collecting user behavior data, including browsing, clicking, purchasing, etc., events, and recording a time stamp.
-Dividing the user behavior data into different time windows according to the time stamps.
Within each time window, the nodes and edges of the graph are constructed from user behavior, e.g. user nodes, commodity nodes, the browsing or purchasing behavior of the commodity by the user forming edges.
-Learning and updating the dynamic user behavior graph using graph computation techniques, such as Graph Neural Network (GNN), to capture changes in user behavior.
2. Graph embedding technology-based community discovery algorithm:
-graph embedding nodes in the user behavior graph, mapping the nodes to a low-dimensional vector space.
-Calculating the similarity between the node vectors using a similarity measure, such as cosine similarity.
-Applying a community discovery algorithm, such as the Girvan-Newman algorithm or the Louvain method, to identify strong connections between nodes, forming communities.
-Analysing community structure, identifying groups of users, which groups may have similar interests or behaviour patterns.
3. Recommendation algorithm based on cluster analysis:
-path analysis and weight calculation are performed on the user behavior graph, and characteristics such as path length, weight and the like are extracted.
Clustering the users according to their behavioral characteristics using a clustering algorithm (e.g., K-means, DBSCAN).
-Performing a feature analysis on each cluster, determining the user interests and purchasing habits of the cluster center.
-Recommending for each user a good similar to the cluster center in which it is located, based on the clustering result.
4. A random walk algorithm is introduced:
-traversing the user behavioural map using a random walk algorithm to explore items that the user may not have interacted with directly.
During the random walk, the commodity is sampled according to path length and weight to find potential recommended commodity.
-Combining the random walk-derived merchandise information with the user clustering results to provide a more personalized recommendation list.
5. Model evaluation and optimization:
Training and testing the above method using the real world dataset.
Evaluating the performance of the recommendation system, such as the indexes of accuracy, recall, coverage, etc.
-Adjusting and optimizing the model based on user feedback and performance evaluation results.
Through the implementation steps, the technology can construct a dynamic user behavior diagram, discover a user community and provide personalized commodity recommendation through cluster analysis and a random walk algorithm.
The application method and the system have the remarkable advantages and beneficial effects as follows.
Firstly, the application method of the invention can more intuitively and clearly present the internal relation and rule of the information consumption data by constructing the graph structure representation of the information consumption data. Traditional data processing methods tend to only focus on the data itself, but neglect the relationships between the data, resulting in a lack of depth and comprehensiveness of the processing results. The graph structure representation can clearly represent the relationship between data, and provides a solid foundation for subsequent graph calculation.
Secondly, the graph structure can be cleaned and the characteristics of the graph structure can be extracted through a preprocessing step, so that the accuracy and the efficiency of the subsequent graph calculation can be greatly improved. In information consumption data, there is often a lot of noise and irrelevant information, which can interfere with the results of the graph computation. Through data cleaning, noise and irrelevant information can be removed, and the accuracy of calculation is improved. Meanwhile, key features and relations in the data can be extracted through operations such as node feature extraction and edge weight calculation, more valuable information is provided for subsequent graph calculation, and therefore calculation efficiency is improved.
Thirdly, the application method of the invention can mine potential rules and modes in the information consumption data by introducing a graph calculation algorithm. The graph computation algorithm can perform deep analysis and mining on the graph structure to find potential rules and patterns in the data. The rules and modes have important guiding significance for information consumption, can help users to better understand the internal mechanism of information consumption, and improve the efficiency and accuracy of information consumption.
Fourth, the system of the present invention has better expandability and flexibility. With the continuous growth and change of information consumption data, the system can be customized and optimized according to actual requirements. For example, new data processing modules, graph calculation algorithm modules, etc. can be added to accommodate different data processing requirements.
The embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, and yet fall within the scope of the invention.

Claims (6)

1. An application method of a graph-based computing technology in the field of information consumption is characterized by comprising the following steps: the method comprises the following steps:
(1) Constructing a dynamic user behavior diagram;
a. collecting behavior data of a user in different time periods;
b. converting the user behavior data into a dynamic user behavior diagram, wherein nodes represent users and edges represent user behaviors;
c. introducing a graph calculation technology to update the dynamic user behavior graph in real time;
(2) Graph embedding technology-based community discovery algorithm:
a. mapping nodes and edges in the dynamic user behavior graph into a low-dimensional vector space;
b. Calculating the similarity among the nodes, and dividing user communities according to the similarity;
(3) Recommendation algorithm based on cluster analysis;
a. Deep mining and analysis of the relationship between the user and the commodity: combining basic information, behavior data and social relation network of the user; taking the user and the article as nodes of the graph, constructing edges according to the user behavior data to represent the interaction relationship between the user and the article, and considering the weight of the edges to represent the preference degree of the user to the article;
b. the interest preference of the user is mined through node similarity analysis and clustering algorithm by using a graph calculation technology;
(4) Personalized recommendation service;
a. Analyzing the user relationship in the social network by using a graph computing technology, identifying the social circle, the influence size and the social interaction mode of the user, and mining the social influence of the user by analyzing the position and the role of the user in the social network;
b. Evaluating social influence of the user by analyzing the breadth and depth of information transmission of the user or the interaction quantity caused by the user;
c. And (3) establishing a preference model: analyzing personal data, interest labels and activity histories of the users, establishing an interest preference model of the users, and recommending possible friends or contents for the users by using a content recommendation technology and combining social relations and interest preferences of the users.
2. The application method of graph-based computing technology in the field of information consumption according to claim 1, wherein: the updating of the dynamic user behavior graph in the step 1 comprises updating of node attributes and updating of the relationship strength of edges.
3. The application method of graph-based computing technology in the field of information consumption according to claim 1, wherein: mapping the nodes and edges in the dynamic user behavior graph into the low-dimensional vector space in step 2 may employ DeepWalk, node, 2Vec, graphSAGE or Node2Vec techniques.
4. The application method of graph-based computing technology in the field of information consumption according to claim 1, wherein: and 3, calculating the similarity between the nodes by using a cosine similarity or pearson correlation coefficient method, and selecting a proper clustering algorithm according to specific requirements and data characteristics, wherein the clustering algorithm comprises a K-means algorithm, a hierarchical clustering algorithm and a density-based clustering algorithm.
5. The application method of graph-based computing technology in the field of information consumption according to claim 1, wherein: in step 4, the key node identification technology based on the GCN analyzes the centrality index of the user in the community so as to identify key nodes, namely, high-influence users with obvious influence on the community structure.
6. An application system in the field of information consumption based on graph computing technology is characterized by comprising the following modules;
And a graph structure building module: realizing a graph structure construction module, and converting the processed user behavior data into a graph structure;
And a graph calculation updating module: updating the user behavior graph in real time by adopting a graph neural network or other graph calculation algorithms, processing newly generated user behavior data, and adjusting the graph structure to reflect the change of the user behavior;
interest preference inference module: deducing interest preference of the user through a graph algorithm by utilizing the updated user behavior graph;
the recommendation generation module: generating a personalized recommendation list according to interest preference and real-time behavior of the user;
And a user interaction optimization module: and monitoring interactive feedback of the user on the recommended content, including clicking, browsing and purchasing behaviors, optimizing a recommendation list according to the interactive feedback of the user, adjusting a recommendation strategy and improving recommendation quality.
CN202410110803.6A 2024-01-26 2024-01-26 Application method and system based on graph computing technology in information consumption field Pending CN118071400A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410110803.6A CN118071400A (en) 2024-01-26 2024-01-26 Application method and system based on graph computing technology in information consumption field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410110803.6A CN118071400A (en) 2024-01-26 2024-01-26 Application method and system based on graph computing technology in information consumption field

Publications (1)

Publication Number Publication Date
CN118071400A true CN118071400A (en) 2024-05-24

Family

ID=91099871

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410110803.6A Pending CN118071400A (en) 2024-01-26 2024-01-26 Application method and system based on graph computing technology in information consumption field

Country Status (1)

Country Link
CN (1) CN118071400A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118350904A (en) * 2024-06-12 2024-07-16 青州市坦博尔服饰股份有限公司 User clothing order behavior analysis method and system based on big data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118350904A (en) * 2024-06-12 2024-07-16 青州市坦博尔服饰股份有限公司 User clothing order behavior analysis method and system based on big data
CN118350904B (en) * 2024-06-12 2024-09-17 青州市坦博尔服饰股份有限公司 User clothing order behavior analysis method and system based on big data

Similar Documents

Publication Publication Date Title
Gasparetti et al. Community detection in social recommender systems: a survey
Yang et al. Friend or frenemy? Predicting signed ties in social networks
Wang et al. Modeling customer preferences using multidimensional network analysis in engineering design
Pan et al. Clustering of designers based on building information modeling event logs
Yu et al. Graph neural network based model for multi-behavior session-based recommendation
Huang et al. Information fusion oriented heterogeneous social network for friend recommendation via community detection
CN110781940A (en) Fuzzy mathematics-based community discovery information processing method and system
WO2023284516A1 (en) Information recommendation method and apparatus based on knowledge graph, and device, medium, and product
CN115408618B (en) Point-of-interest recommendation method based on social relation fusion position dynamic popularity and geographic features
Meng et al. POI recommendation for occasional groups Based on hybrid graph neural networks
Wang et al. Link prediction in heterogeneous collaboration networks
CN118071400A (en) Application method and system based on graph computing technology in information consumption field
Kumar et al. Graph Convolutional Neural Networks for Link Prediction in Social Networks
Amzad et al. Tourism recommendation system: a systematic review
Zhang et al. Multi-view dynamic heterogeneous information network embedding
CN113256024B (en) User behavior prediction method fusing group behaviors
Agarwal et al. Binarized spiking neural networks optimized with Nomadic People Optimization-based sentiment analysis for social product recommendation
Qin et al. Recommender resources based on acquiring user's requirement and exploring user's preference with Word2Vec model in web service
Alzubaidi et al. LPCNN: convolutional neural network for link prediction based on network structured features
Xu et al. Similarmf: a social recommender system using an embedding method
Chen et al. KGCF: Social relationship-aware graph collaborative filtering for recommendation
Fan E-Commerce Data Mining Analysis based on User Preferences and Association Rules
CN118210976B (en) Link prediction method integrating attention mechanism and graph contrast learning
D’Aniello et al. Link prediction in signed social networks using fuzzy signature
Wang et al. [Retracted] Research on the Spectral Domain Graph Convolution Collaborative Filtering Algorithm Based on Reinforcement Learning and Chebyshev

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication