CN114970684A

CN114970684A - Community detection method for extracting network core structure by combining VAE

Info

Publication number: CN114970684A
Application number: CN202210483834.7A
Authority: CN
Inventors: 费蓉; 万雨欣; 李爱民; 王战敏; 马梦阳; 吴丽丽
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2022-08-30

Abstract

The invention provides a community detection method for extracting a network core structure by combining VAE. The method comprises the steps of searching core information in a network by using a k-tress algorithm in combination with topological structure information and core structure information in a real network, performing feature dimensionality reduction and feature extraction on a similarity matrix with a network core structure by using a variational automatic encoder model (VAE), extracting a mean value and a logarithmic variance in a training process, generating random numbers with different features according to samples with different features, and generating data according to the normally distributed random numbers. The invention uses the coding part to reduce the data dimension, minimizes the reconstruction error and the divergence loss to obtain the optimal solution. The existing algorithm mostly utilizes a classical system, has no obvious effect on a complex network, and has long experiment time and great difficulty. Compared with other methods, the method has better effect, and can obtain a complete and more accurate community division result.

Description

Community detection method for extracting network core structure by combining VAE

The technical field is as follows:

the invention relates to a novel method for carrying out community detection on a real network by combining a logic analysis technology and an automatic encoder model, belonging to the technical field of community detection and particularly combining a k-tress algorithm and a variational automatic encoder model (VAE).

The background art comprises the following steps:

with the continuous development of the internet and artificial intelligence technologies, social networks have appeared and gradually emerged in our daily lives, which can show real social groups consisting of people with the same interests, professions and abilities. At the heart of a social network is a user and the relationship between users. The nodes of a real social network can typically reach tens of millions or even billions, and such networks are often complex. Therefore, for the above situations, a method must be used to discover and identify these real complex network structures. The community detection technology is an extremely important method for understanding and exploring a plurality of real complex network structures in the real world, and after the 21 st century, the research and analysis of the community structures in the complex networks are concerned by a plurality of scholars, and a plurality of novel community detection algorithms also appear.

Most of traditional community discovery algorithms are researched aiming at the topological structure of a community network, but a network core structure often exists in the community network, and the core structure usually plays a key role in network structure analysis. The network core structure is represented as a compact and dense point set connected among nodes, and is closer to the definition of a community structure in which nodes in the community are connected densely and nodes outside the community are connected sparsely. Meanwhile, the characteristics of the core structure in the network can be better extracted by training by combining community information in a real network and a variational automatic encoder model. Therefore, aiming at the situations, the network topology structure and the core structure in the network are comprehensively considered for carrying out comprehensive analysis so as to obtain a better result and improve the existing community detection algorithm.

Disclosure of Invention

The invention provides a community detection method for extracting a network core structure by combining VAE, which is characterized in that a k-tress algorithm is used for finding a k-tress structure in a network, and the method specifically comprises the following steps:

step 1: carrying out data preprocessing on an original data set, establishing an adjacent matrix of a network diagram, and generating a similarity matrix of the adjacent matrix;

step 2: performing feature dimension reduction and feature extraction on the similarity matrix with the network core structure by using a variational automatic encoder model;

and step 3: the algorithm model based on the variational automatic encoder consists of a community information extraction module and a deep learning module, and is combined with community information to perform adjustment training under a given deep learning framework to obtain low-dimensional feature representation of a network structure;

and 4, step 4: and 3, clustering the low-dimensional feature matrix obtained in the step 3 by using a K-means algorithm to finally obtain a community division result.

Further, step 1 specifically includes:

1.1: acquiring data in an original data set, and establishing an original network diagram;

1.2: establishing an adjacency matrix for the original network diagram in the step 1.1;

1.3: calculating the support degree of each edge in the original network graph in the step 1.1;

1.4: calculating a k-tress core structure in the original network diagram in the step 1.3;

1.5: calculating the similarity of each node according to the adjacency matrix in the step 1.2 and the k-tress core structure in the step 1.4;

1.6: and (4) according to the adjacent matrix in the step 1.2 and the similarity of each node in the step 1.5, calculating and generating a similarity matrix.

Further, in step 1.2, given graph G ═ V, E, where V is the set of nodes in graph G and E is the set of edges in graph G, let n (u) be the set of adjacent nodes to node u, let matrix a ═ a _ij ] _n×n For the adjacency matrix of FIG. G, n represents the dimension of the matrix, when a _ij When 1, it represents an edge e _ij Exist when a _ij When equal to 0, represents an edge e _ij Is absent.

Further, step (ii)In step 1.5, for graph G ═ V, E, i, j ∈ V, E _ij E, adjacency matrix A ═ a _ij ] _n×n Edge e _ij Support degree of (e) _ij ) Edge e _ij The similarity of (c) defines:

further, in step 1.6, for graph G ═ (V, E), its similarity matrix is defined as X ═ X _ij ] _n×n Wherein x is _ij ＝Sim(e _ij )，i,j∈V,e _ij ∈E。

Further, step 2 specifically includes:

2.1: establishing a variation automatic encoder model;

2.2: inputting the similarity matrix obtained in the step 1 into a coding part of a variational automatic coder model according to a formula H _i ＝σ(W ₁ X _i +b ₁ ) Calculating the forward propagation process to obtain a hidden layer representation H _i ；

2.3: extracting each hidden layer representation H in step 2.2 _i Mean and logarithmic variance of;

2.4: sampling the random parameter ε from a standard normal distribution N (0,1) _i Update H _i ＝μ _i + ε _i ×σ _i Will update the H _i As input to the decoding section;

2.5: h obtained in step 2.4 _i Input into the decoding part of the variational automatic encoder model according to the formula Y _i ＝σ(W ₂ H _i +b ₂ ) Calculating the forward propagation process to obtain data and generating Y _i 。

2.6: updating neural network parameter variables W through an error back propagation algorithm ₁ ,W ₂ ,b ₁ ,b ₂ ；

2.7: when the training target required by the model is reached, continuing the next step, otherwise, repeatedly executing the step training step;

2.8: when the training times required by the model are reached, continuing the next step, otherwise, repeatedly executing the training step;

2.9: when the requirements of step 2.7 and step 2.8 are met, the matrix trained by the variational automatic encoder model is obtained.

Further, in step 2.6, the cross entropy loss function in the variational automatic encoder is defined as:

wherein V is the number of network graph vertexes, X _i Is the ith column vector, Y, of the matrix in the input layer _i Is the ith column vector of the matrix in the output layer;

the KL divergence function is defined as:

wherein, mu _i Is taken as the average value of the values,

is variance, m is hidden layer H _i Of (c) is calculated.

Further, step 3 specifically includes:

3.1 extracting a community information matrix C;

3.2, establishing a variational automatic encoder model, wherein the settings are the same as the settings in the step 2 except for the total loss function;

3.3, inputting the matrix obtained in the step 2.9 into the variational automatic encoder model in the step 3.2, and obtaining a low-dimensional feature matrix through training.

Further, in step 3.1, the definition of the community information matrix C:

for the network graph G, for the obtained community information matrix C ∈ R ^|V|×m Where m is the number of communities;

each row vector of matrix C represents the probability that a different vertex belongs to a certain community.

Further, in step 3.2, a layer and mean μ is added _x The same dimension vector mu _c Namely the average value of the community information distribution,

is the variance. The corner mark x represents adjacency information, and the corner mark c represents community information. After community information is added, the new KL divergence loss function is changed into:

wherein m is a hidden layer H _i Of (c) is calculated.

The algorithm model in the invention can be applied to reveal the aggregation behavior of discovering a certain real network, namely discovering and analyzing the community structure in the real network. In a complex real network, feature extraction and dimension reduction can be carried out on a matrix established according to the network, clustering is carried out, and division of community structures is completed. In the social network which is most closely related to the internet, the social network platform can establish a network information matrix according to information uploaded by a user and various information such as age, sex, profession, hobby, use time and the like, and the division of different communities is completed through the algorithm model in the invention. The method can be generally used for social software friend circle analysis, professional introduction analysis, friend discovery and other related functions. In other application fields, such as biology, the method can be used for analyzing complex food chain network structures, and can also be used for feature extraction and dimension reduction of complex protein networks to complete the division of different protein structures.

Drawings

Fig. 1 is a main flow diagram of the present invention.

FIG. 2 is a detailed flow chart of specific steps of the present invention.

FIG. 3 is a diagram of a variation autoencoder model of the present invention.

Fig. 4 is a diagram of a small network topology structure, which specifically illustrates an example of a k-tress network core structure according to the present invention.

Fig. 5(a) -5 (C) illustrate the community partitioning effect on the Dolphins network, and the numbers on the nodes represent the community labels into which the nodes are partitioned. In this case, the graph (a) shows a real community division situation, the graph (B) shows a community division situation according to the present invention, and the graph (C) shows a community division situation of the FUA algorithm.

Fig. 6(a) -6 (C) are community partitioning effects on the Football network, and the numbers on the nodes represent the community labels into which the nodes are partitioned. In this case, the graph (a) shows a real community division situation, the graph (B) shows a community division situation according to the present invention, and the graph (C) shows a community division situation of the FUA algorithm.

Detailed Description

The technical solution of the present invention is further described below by way of examples with reference to the accompanying drawings.

Example (b):

the following describes a specific method of this embodiment with reference to flowcharts shown in fig. 1 and fig. 2.

The data set used by the invention is a data set which is published abroad, and the required data set is based on a real network community, so that the number of the data sets is huge. Directly inputting data into a neural network can result in inefficient neural network processing. Meanwhile, in a small simple topological structure, the community relation in the graph can be calculated by directly using a clustering algorithm, and the result is accurate. However, the adjacency matrix only records the relationships between adjacent nodes, and for any two nodes in the community that are not connected, they may exist in the same community. The relationship for describing the nodes in the community structure cannot only use the adjacency matrix, and if the used community detection algorithm is directly clustered on the basis of the adjacency matrix, the precision degree of community division is influenced.

Therefore, the invention provides a data preprocessing method for searching a network core structure based on a k-tress algorithm, the k-tress algorithm is used for finding the k-tress structure in a network, the definition of the structure is more consistent with the network core structure, meanwhile, a characteristic matrix with the network core structure is generated by combining a network topological structure, then the matrix is sent to a variation automatic encoder for training, the variation automatic encoder can well extract and reduce the dimension of the network core structure, and then secondary training is carried out by combining community information and the variation automatic encoder, so that the effect of community division is improved.

As shown in the figure, the community detection method for extracting a network core structure by combining with a VAE in this embodiment specifically includes the following steps:

step 1: and carrying out data preprocessing on the original data set, establishing an adjacency matrix of the network graph, and generating a similarity matrix of the adjacency matrix.

The specific steps of data preprocessing are as follows:

step 1.1: and acquiring data in the original data set, and establishing an original network diagram.

Step 1.2: an adjacency matrix is established for the original network graph in step 1.1.

Given graph G ═ V, E, V being the set of nodes in graph G, E being the set of edges in graph G, let n (u) be the set of adjacent nodes to node u, let matrix a ═ a _ij ] _n×n For the adjacency matrix of FIG. G, n represents the dimension of the matrix, when a _ij When 1, it represents an edge e _ij Exist when a _ij When equal to 0, represents an edge e _ij Is absent.

Step 1.3: and (4) calculating the support degree of each edge in the original network graph in the step 1.1.

For the graph G ═ V, E and its subgraph G ', the support degree of the edge E in the subgraph G ' is calculated, and sup (E, G ') represents the support degree of the edge E in the subgraph G ', that is, the number of triangles including the edge E in the subgraph G ' is abbreviated as sup (E).

Step 1.4: the k-tress core structure in the original network map in step 1.3 is computed.

For graph G ═ V, E, sub-graph G is defined _k (k.gtoreq.2) is a k-tress structure in the graph G, and a subgraph G is defined _k The support degree of any edge e is more than or equal to (k-2). Is marked as

Step 1.5: and calculating the similarity of each node according to the adjacency matrix in the step 1.2 and the k-tress core structure in the step 1.4.

For graph G ═ V, E), i, j ∈ V, E _ij E, adjacency matrix A ═ a _ij ] _n×n Side e of _ij Support degree of (e) _ij ) Edge e _ij The similarity of (c) defines:

step 1.6: and according to the similarity between the adjacency matrix in the step 1.2 and each node in the step 1.5, calculating a similarity matrix:

for graph G ═ (V, E), its similarity matrix is defined as X ═ X _ij ] _n×n Wherein x is _ij ＝Sim(e _ij )，i,j∈V,e _ij ∈E。

Step 2: and performing feature dimension reduction and feature extraction on the similarity matrix with the network core structure by using a variational automatic encoder model.

And extracting the mean value and the logarithmic variance in the training process, generating random numbers with different characteristics according to samples with different characteristics, and generating data according to the random numbers with normal distribution. And the data dimension is reduced by utilizing the coding part, and the optimal solution is obtained by minimizing the reconstruction error and the divergence loss.

The specific steps of the transform automatic encoder module in step 2 are as follows:

step 2.1: and establishing a variational automatic encoder model.

With reference to fig. 3, a variational autoencoder model is first established. Then there are:

H _i ＝σ(W ₁ X _i +b ₁ ),Y _i ＝σ(W ₂ H _i +b ₂ )

wherein H _i Is the ith column vector of the matrix in the hidden layer; σ is an activation function, such as Sigmoid function; x _i Is the ith column vector, Y, of the matrix in the input layer _i Is the ith column vector, i table, of the matrix in the output layerDenotes any one of numbers 1 to | V |, W ₁ And W ₂ Respectively are a weight matrix of the input layer and a weight matrix of the hidden layer; b ₁ And b ₂ Respectively, the bias vector of the input layer and the bias vector of the hidden layer. The former expression is the coding part, and the latter expression is the decoding part.

Step 2.2: inputting the similarity matrix obtained in the step 1 into a coding part of a variational automatic coder model according to a formula H _i ＝σ(W ₁ X _i +b ₁ ) Calculating the forward propagation process to obtain a hidden layer representation H _i And i represents any one of numbers 1 to | V |.

Step 2.3: extracting each hidden layer representation H in step 2.2 _i Mean and log variance of.

Setting the mean value of the parameters of standard normal distribution as mu and the logarithmic variance as log sigma ² . Construction of neural network mu _i ＝f(X _i ) And

f is the activation function. Wherein, X _i Representing the input layer vector, mu _i Represents a slave hidden layer H _i The average value vector of the extracted average value vector,

represents a slave hidden layer H _i The extracted log variance vector, f represents an activation function such as a Sigmoid function, and i represents any one of numbers 1 to | V |.

Step 2.4: sampling the random parameter ε from a standard normal distribution N (0,1) _i Update H _i ＝μ _i +ε _i ×σ _i Will update the H _i As input to the decoding section.

Wherein H _i Represents the updated i-th column vector, mu, of the hidden layer _i Representing a slave hidden layer H _i Extracted ith column mean vector, σ _i Represents a slave hidden layer H _i Extracted variance vector, ε _i Denotes a random parameter (how i is interpreted) sampled from a standard normal distribution N (0,1), i denotes any one of numbers 1 to | V |.

Step 2.5: h obtained in step 2.4 _i Input into the decoding part of the variational automatic encoder model according to the formula Y _i ＝σ(W ₂ H _i +b ₂ ) Calculating the forward propagation process to obtain data and generating Y _i ，Y _i Represents the ith column vector of the output layer, i represents any number from 1 to | V |.

Step 2.6: updating neural network parameter variables W through an error back propagation algorithm ₁ ,W ₂ ,b ₁ ,b ₂ 。

Cross entropy loss function L in variational automatic encoder ₁ Is defined as:

wherein V is the number of network graph vertexes, X _i Is the ith column vector, Y, of the matrix in the input layer _i Is the ith column vector of the matrix in the output layer, i represents any number from 1 to | V |.

In order to harmonize the proportional degree of the two loss functions, the variational automatic encoder introduces the concept of KL divergence, and the essence of the KL divergence is the regular specification of a discriminant model. KL divergence function L ₂ Is defined as:

wherein, mu _i Is taken as the mean value of the average value,

is variance, m is hidden layer H _i I represents any number of 1 to | V |.

Total Loss function Loss _sum Is defined as:

Loss _sum ＝L ₁ +L ₂

wherein L is ₁ As a cross-entropy loss function, L ₂ Is a KL divergence function.

Finally, by error reverse transmissionBroadcast algorithm update parameter variable W ₁ ,W ₂ ,b ₁ ,b ₂ 。

Step 2.7: and when the training target required by the model is reached, continuing the next step, otherwise, repeatedly executing the step training step.

Step 2.8: and when the training times required by the model are reached, continuing the next step, otherwise, repeatedly executing the step training step.

Step 2.9: when the requirements of step 2.7 and step 2.8 are met, the matrix trained by the variational automatic encoder model is obtained.

The feature dimension reduction and extraction module performs feature dimension reduction and feature extraction on the similarity matrix with the network core structure by using a variational automatic encoder model, extracts a mean value and a logarithmic variance in a training process, generates random numbers with different characteristics according to samples with different characteristics, and generates data according to the random numbers in normal distribution. And the data dimension is reduced by utilizing the coding part, and the optimal solution is obtained by minimizing the reconstruction error and the divergence loss.

And step 3: the algorithm model based on the variational automatic encoder is composed of a community information extraction module and a deep learning module, adjustment training is carried out under a given deep learning framework by combining community information, low-dimensional feature representation of a network structure is obtained, and the method specifically comprises the following steps:

and 3.1, extracting a community information matrix C.

For the network graph G, for the obtained community information matrix C ∈ R ^|V|×m ，

Where m is the number of communities. Each row vector of matrix C represents the probability that a different vertex belongs to a certain community.

Step 3.2 establishes a variational autoencoder model, and the settings are the same as those in step 2 except for the total loss function.

The modification of the loss function is as follows:

adding a layer and the mean value mu _x The same dimension vector mu _c Namely the average value of the community information distribution,

is the variance. The corner mark x represents adjacency information, and the corner mark c represents community information. New KL divergence loss function after adding community information

The updating is as follows:

wherein m is a hidden layer H _i I represents any number of 1 to | V |.

The total loss function is defined as:

wherein L is ₁ As a cross-entropy loss function, L ₂ Is the modified KL divergence function.

And 3.3, inputting the matrix obtained in the step 2.9 into the variational automatic encoder model in the step 3.2, and obtaining a low-dimensional characteristic matrix through training.

And 4, step 4: and 3, clustering the low-dimensional feature matrix obtained in the step 3 by using a K-means algorithm, setting the clustering number as the number of the types of the real community labels, and finally obtaining a community division result.

Examples

Taking fig. 4 (a topology structure diagram of a small network) as an example, a core network structure (k-tress structure) is specifically described. And when the two nodes have connecting edges, the two nodes are connected by a straight line.

FIG. 4 is a 2-tress structure, where a sub-graph represented by a light color part in the graph is a 3-tress structure, that is, each edge in the sub-graph belongs to at least one triangle, and a sub-graph represented by a dark color part in the graph is a 4-tress structure and is also a 5-tress structure, that is, each edge in the sub-graph belongs to at least two or three triangles. Therefore, the k-tress structure is a layered and progressive subgraph structure, and network cores with different density degrees are depicted by different k values. The invention comprehensively utilizes the k-tress structure to search the network core structure so as to achieve better community division effect.

The invention selects the real network community data set for verification, thereby analyzing and verifying the feasibility and the accuracy of the proposed algorithm model.

The present invention uses ten real datasets: strike, Karate, Dolphins, Football, Email _ Eu, Polblogs, Cora, Citeser, Youtube, and DBLP.

Wherein, Strike, karte, Dolphins and Football belong to small datasets (ten and hundred orders of magnitude), Email _ Eu, Polblogs, Cora and Citeseer belong to medium datasets (thousand orders of magnitude), Youtube and DBLP belong to large datasets (ten thousand orders of magnitude).

The evaluation criteria are set to detect whether the final clustering result meets the expected criteria, and three evaluation indexes are selected in the text. Where Q is used to measure the quality of the final community partition, and NMI and F _same The evaluation method is an evaluation method for analyzing according to a real community set.

Given a set of clustering results C ═ { C ═ C ₁ ,C ₂ ,C ₃ …C _m I.e. communities found by the algorithm. Set of real communities C '═ C' ₁ ,C' ₂ ,C' ₃ ,…C' _n And f, namely, the real existing communities, m is the number of community partition and clustering results, and n is the number of real communities.

The modularity Q is used to measure the quality of the final community partition, which can evaluate the goodness of a community partition. Is defined as:

where d is the total number of edges in graph C', m is the number of community partition clustering results,

is a drawing C _i The total number of the inner edges in (b),

is a drawing C _i Total number of middle outer edges.

The NMI method is English called Normalized Mutual Information, which means normalization of Mutual Information. NMI is defined as:

wherein X is a confusion matrix and X ∈ R ^(n×n) In the confusion matrix X, rows represent the real community set C', and columns represent the clustering result set C. X _ij Representative graph C _i And C' _j I.e. the overlapping nodes of the two graphs. The more similar the algorithm finds a community to a truly existing community, the closer the NMI is to 1. Conversely, the closer the NMI is to 0.

F _same The method comprises the steps of calculating the intersection of communities discovered by each algorithm and the communities in reality, and calculating the mean value of the obtained result. Is defined as:

where V is the total number of nodes in the graph C', m is the number of community partition clustering results, and n is the number of real communities.

As described above, the present invention provides a community detection method for extracting a network core structure in combination with a VAE, based on the practical point of view, by combining a network topology in a real network and a core structure in the network, and the method mainly includes the following modules:

1. a data preprocessing module: because the data set used by the invention is a data set disclosed abroad and the data set is needed by experiments to analyze the network topology structure and the core structure in the network in the real network, the core structure in the real network can be well searched by using the k-tress algorithm. The data set used therefore needs to be preprocessed.

2. The feature dimension reduction and extraction module comprises: and performing feature dimensionality reduction and feature extraction on the similarity matrix with the network core structure by using a variational automatic encoder model, extracting a mean value and a logarithmic variance in a training process, generating random numbers with different characteristics according to samples with different characteristics, and generating data according to the random numbers in normal distribution. And the data dimension is reduced by utilizing the coding part, and the optimal solution is obtained by minimizing the reconstruction error and the divergence loss.

3. The community information combining module: in order to effectively combine community information with a variational automatic encoder, an algorithm model based on the variational automatic encoder consists of a community information extraction module and a deep learning module. And adjusting and training under a given deep learning framework by combining community information to obtain the low-dimensional feature representation of the network structure.

4. A clustering module: based on the

modules

1, 2 and 3, clustering operation is carried out on the low-dimensional feature matrix obtained after the variational automatic encoder model is trained by utilizing a K-means clustering algorithm, and finally community structure division is completed.

The algorithm model in the invention can be applied to reveal the aggregation behavior of discovering a certain real network, namely discovering and analyzing the community structure in the real network. In a complex real network, feature extraction and dimension reduction can be carried out on a matrix established according to the network, clustering is carried out, and division of community structures is completed. In the social network which is most closely related to the internet, the social network platform can establish a network information matrix according to information uploaded by a user and various information such as age, sex, profession, hobby, use time and the like, and the division of different communities is completed through the algorithm model in the invention. The method can be generally used for social software friend circle analysis, professional introduction analysis, friend discovery and other related functions. In other application fields, such as biology, the method can be used for analyzing the complex food chain network structure, and can also be used for feature extraction and dimension reduction of the complex protein network to complete the division of different protein structures.

Most of traditional community discovery algorithms are researched aiming at the topological structure of a community network, but the core structure in the community network can greatly influence the accuracy of a community detection algorithm, and the final community division can be influenced by a method for reducing the dimension of a characteristic matrix and a clustering process. The invention comprehensively considers the factors, realizes the community detection method for extracting the network core structure by combining the VAE, has better experimental effect on three evaluation indexes compared with other methods through a plurality of experiments, and can obtain a complete and more accurate community division result. The method has the advantages that the method has a good community division effect in the medium-sized, small-sized and dense complex real network, has a certain community division effect in the medium-sized, large-sized and sparse complex real network, and is higher in the accuracy of community structure division compared with other traditional algorithms.

Examples

The experimental effect of the invention can be illustrated by the following simulation experiment:

the experimental environment of the invention is based on a Windows 10 operating system, the software environment is Python language, and the hardware environment is Intel Core i7-9700K CPU and 48GB memory capacity.

Dolphins and Football data sets were selected for comparative experiments using the present invention and the FUA algorithm. The FUA algorithm is a widely used community detection algorithm based on modularity, and the optimization goal of the algorithm is to maximize the modularity of the entire community network.

FIGS. 5(A) - (C) illustrate the effect of community partitioning on a Dolphins network, where the numbers on a node represent the community labels into which the node is partitioned. In this case, the graph (a) shows a real community division situation, the graph (B) shows a community division situation according to the present invention, and the graph (C) shows a community division situation of the FUA algorithm.

As shown in fig. 5, the Dolphins network has 62 nodes, 159 edges, and 2-class community labels in total, the numbers on the nodes represent the community labels into which the nodes are divided, graph (a) shows the real community division, graph (b) shows the community division according to the present invention, and graph (c) shows the community division of the FUA algorithm. It can be found that in the Dolphins network, the community division condition of the invention is completely consistent with the real community division condition, while the FUA algorithm divides the network into 5 communities in error, and the division accuracy is low and is greatly different from the real community division condition.

Fig. 6(a) - (C) are the community partitioning effect on the Football network, and the numbers on the nodes represent the community labels into which the nodes are partitioned. Among them, the graph (a) shows the real community division, the graph (B) shows the community division of the present invention, and the graph (C) shows the community division of the FUA algorithm.

As shown in fig. 6, the Football network has 180 nodes, 788 edges and 12 types of community labels, the numbers on the nodes represent the community labels into which the nodes are divided, the graph (a) represents the real community division condition, the graph (b) represents the community division condition of the invention, and the graph (c) represents the community division condition of the FUA algorithm. It can be found that, on the Football network, the community division condition of the present invention only differs from the real community division condition by one node (the node in the square frame in the graph (b), the real community label is "0", and the community label divided by the present invention is "5"), while the FUA algorithm divides the network into 9 communities in error, and the division accuracy is low, which is greatly different from the real community division condition.

Therefore, compared with the traditional algorithm, the community division effect of the invention is more accurate and is closer to a real network.

Claims

1. A community detection method for extracting a network core structure in combination with VAE is characterized in that a k-tress algorithm is used for finding a k-tress structure in a network, and the method specifically comprises the following steps:

step 2: performing feature dimension reduction and feature extraction on the similarity matrix generated in the step (1) by using a variation automatic encoder model, wherein the variation automatic encoder model consists of a community information extraction module and a deep learning module;

and step 3: adjusting and training the variational automatic encoder model under a given deep learning framework by combining community information, and obtaining a low-dimensional characteristic matrix of a network structure;

2. The community detection method for extracting network core structure in combination with VAE according to claim 1,

the step 1 specifically comprises the following steps:

3. The method for community detection of extracted network core structure in combination with VAE according to claim 2,

in step 1.2, given graph G ═ V, E, where V is the set of nodes in graph G and E is the set of edges in graph G, let n (u) be the set of adjacent nodes to node u, let matrix a ═ a _ij ] _n×n For the adjacency matrix of FIG. G, n represents the dimension of the matrix, when a _ij When 1, it represents an edge e _ij Exist when a _ij When equal to 0, represents an edge e _ij Is absent.

4. The method for community detection of extracted network core structure in combination with VAE according to claim 3,

in step 1.5, for graph G ═ (V, E), i,j∈V，e _ij e, adjacency matrix A ═ a _ij ] _n×n Edge e _ij Support degree of (e) _ij ) Edge e _ij The similarity of (c) defines:

5. the method for community detection of extracted network core structure in combination with VAE according to claim 4,

in step 1.6, for graph G ═ (V, E), its similarity matrix is defined as X ═ X _ij ] _n×n Wherein x is _ij ＝Sim(e _ij )，i，j∈V，e _ij ∈E。

6. The method for community detection of extracted network core structure in combination with VAE according to claim 1,

the step 2 specifically comprises the following steps:

2.1: establishing a variation automatic encoder model;

2.2: inputting the similarity matrix obtained in the step 1 into a coding part of a variational automatic coder model according to a formula H _i ＝σ(W ₁ X _i +b ₁ ) Calculating the forward propagation process to obtain a hidden layer representation H _i I represents any one of numbers 1 to | V |;

2.4: sampling the random parameter ε from a standard normal distribution N (0,1) _i Update H _i ＝μ _i +ε _i ×σ _i Will update the H _i As input to the decoding section;

2.5: h obtained in step 2.4 _i Input into the decoding part of the variational automatic encoder model according to the formula Y _i ＝σ(W ₂ H _i +b ₂ ) Calculating the forward propagation process to obtain data and generating Y _i ，Y _i Represents the ith column vector of the output layer, i represents any number from 1 to | V |;

2.6: updating neural network parameter variables W through an error back propagation algorithm ₁ ，W ₂ ，b ₁ ，b ₂ ；

2.9: and when the requirements of the step 2.7 and the step 2.8 are met, obtaining the low-dimensional feature matrix trained by the variational automatic encoder model.

7. The method for community detection of extracted network core structure in combination with VAF according to claim 6,

in step 2.6, the cross entropy loss function in the variational automatic encoder is defined as:

wherein V is the number of network graph vertexes, X _i Is the ith column vector, Y, of the matrix in the input layer _i Is the ith column vector of the matrix in the output layer, i represents any number from 1 to | V |;

KL divergence function L ₂ Is defined as:

wherein, mu _i Is taken as the mean value of the average value,

is variance, m is hidden layer H _i I represents any number of 1 to | V |.

8. The method for community detection of extracted network core structure in combination with VAE according to claim 1,

the step 3 specifically comprises:

3.1 extracting a community information matrix C;

3.2, establishing a variational automatic encoder model, wherein the settings are the same as those in the step 2 except for the total loss function;

9. The method for community detection of abstracted network core architecture in combination with VAE according to claim 8,

in step 3.1, defining a community information matrix C:

for the network graph G, for the obtained community information matrix C ∈ R ^|V|×m Where m is the number of communities and each row vector of matrix C represents the probability that a different vertex belongs to a community.

10. The method for community detection of abstracted network core architecture in combination with VAE according to claim 9,

in step 3.2, add one layer and mean μ _x Vectors μ of the same dimension _c Namely the average value of the community information distribution,

wherein m is a hidden layer H _i I represents any number of 1 to | V |.