WO2023000574A1 - Model training method, apparatus and device, and readable storage medium - Google Patents
Model training method, apparatus and device, and readable storage medium Download PDFInfo
- Publication number
- WO2023000574A1 WO2023000574A1 PCT/CN2021/134051 CN2021134051W WO2023000574A1 WO 2023000574 A1 WO2023000574 A1 WO 2023000574A1 CN 2021134051 W CN2021134051 W CN 2021134051W WO 2023000574 A1 WO2023000574 A1 WO 2023000574A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- matrix
- graph
- loss value
- convolutional neural
- chebyshev
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 131
- 238000000034 method Methods 0.000 title claims abstract description 58
- 239000011159 matrix material Substances 0.000 claims abstract description 212
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 147
- 230000009977 dual effect Effects 0.000 claims abstract description 26
- 238000013145 classification model Methods 0.000 claims abstract description 22
- 230000006870 function Effects 0.000 claims description 30
- 230000009466 transformation Effects 0.000 claims description 28
- 238000005295 random walk Methods 0.000 claims description 21
- 238000005070 sampling Methods 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 6
- 230000017105 transposition Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 abstract description 12
- 238000013528 artificial neural network Methods 0.000 description 8
- 239000013598 vector Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 208000035473 Communicable disease Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Definitions
- the present application relates to the field of computer technology, in particular to a model training method, device, equipment and readable storage medium.
- Graph neural network simply put, is a deep learning architecture for graph-structured data, which combines end-to-end learning with inductive reasoning, and is expected to solve problems such as causal reasoning and interpretability that traditional deep learning architectures cannot handle. series of bottlenecks.
- graph convolutional neural networks can be divided into two types based on spatial methods and spectral methods.
- the former uses the information propagation mechanism displayed on the graph, which lacks interpretability;
- the latter uses the Laplacian matrix of the graph as a tool, has a good theoretical basis, and is the mainstream direction of graph convolutional neural network research.
- the current graph convolutional neural networks based on spectral methods do not perform well when applying graph vertex classification tasks, that is, the existing graph convolutional neural network-based vertex classification models perform poorly.
- the purpose of the present application is to provide a model training method, device, device and readable storage medium to improve the performance of the vertex classification model.
- the specific plan is as follows:
- the present application provides a model training method, including:
- the random walk and sampling are performed based on the adjacency matrix to obtain a positive point-wise mutual information matrix, including:
- a random walk of a preset length is performed on each vertex in the graph data set to obtain a context path of each vertex;
- the co-occurrence probability of the vertex and the context and the corresponding edge probability are calculated, and each element in the positive point-wise mutual information matrix is determined.
- the calculating the first loss value between the first training result and the label matrix includes:
- the degree of difference in probability distribution between the first training result and the label matrix is used as the first loss value.
- said calculating a second loss value between said second training result and said first training result includes:
- the determining the target loss value based on the first loss value and the second loss value includes:
- the target loss value does not meet the preset convergence condition, updating the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the target loss value Network parameters, and iteratively training the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition;
- the updating the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the purpose loss value includes:
- the new network parameters are shared with the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network.
- both the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used to process the input data Feature transformation and graph convolution operations;
- the feature transformation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
- the graph convolution operation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
- Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation
- Hl is the input data of the lth convolutional layer of the graph convolutional neural network
- Hl+1 is the graph
- the output data of the first convolutional layer of the convolutional neural network is the transposition matrix of the feature transformation matrix that needs to be learned by the first convolutional layer of the graph convolutional neural network
- ⁇ is a nonlinear activation function
- K ⁇ n is the order of the polynomial
- n is the The number of vertices
- ⁇ k is the coefficient of the polynomial
- T k (x) 2xT k-1 (x)-T k-2 (x)
- model training device including:
- the obtaining module is used to obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set;
- a sampling module configured to perform random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix
- the first training module is used to input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;
- the second training module is used to input the vertex feature matrix and the positive point-wise mutual information matrix into the second Chebyshev graph convolutional neural network to output the second training result;
- a first calculation module configured to calculate a first loss value between the first training result and the label matrix
- a second calculation module configured to calculate a second loss value between the second training result and the first training result
- a determining module configured to determine a target loss value based on the first loss value and the second loss value
- a combination module configured to combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model if the target loss value meets a preset convergence condition .
- the present application provides a model training device, including:
- a processor is configured to execute the computer program to implement the model training method disclosed above.
- the present application provides a readable storage medium for storing a computer program, wherein when the computer program is executed by a processor, the aforementioned disclosed model training method is implemented.
- the present application provides a model training method, including: obtaining the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set; performing random walk and sampling based on the adjacency matrix to obtain the positive point-by-point Mutual information matrix; Input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result; Input the vertex feature matrix and the positive point-by-point mutual information matrix A second Chebyshev graph convolutional neural network to output a second training result; calculate a first loss value between the first training result and the label matrix; calculate the second training result and the first the second loss value between the training results; determine the target loss value based on the first loss value and the second loss value; if the target loss value meets the preset convergence condition, the first Chebyshev The graph convolutional neural network and the second Chebyshev graph convolutional neural network are combined into a dual vertex classification model.
- this application designs two Chebyshev graph convolutional neural networks, the first Chebyshev graph convolutional neural network is based on vertex feature matrix, adjacency matrix, and label matrix for supervised training, while the second Chebyshev graph convolutional neural network
- the product neural network performs unsupervised training based on the vertex feature matrix, the positive point-wise mutual information matrix and the output of the first Chebyshev graph convolutional neural network during the training process; when the target loss value determined based on the loss value of the two
- the two Chebyshev graph convolutional neural networks are combined into a dual vertex classification model to train a vertex classification model with better performance.
- This scheme can give full play to the respective advantages of supervised training and unsupervised training, and improve the performance of the vertex classification model.
- model training device, equipment and readable storage medium provided by the present application also have the above-mentioned technical effects.
- Fig. 1 is a schematic structural diagram of a graph convolutional neural network disclosed in the present application
- Fig. 2 is a flow chart of a model training method disclosed in the present application
- Fig. 3 is a schematic diagram of the data trend of a dual Chebyshev graph convolutional neural network disclosed in the present application;
- FIG. 4 is a schematic diagram of a dual Chebyshev graph convolutional neural network disclosed in the present application.
- FIG. 5 is a flow chart of a model construction and training method disclosed in the present application.
- FIG. 6 is a schematic diagram of a model training device disclosed in the present application.
- FIG. 7 is a schematic diagram of a model training device disclosed in the present application.
- V represents the set of vertices
- E represents A collection of connected edges
- V L is a subset of V
- the vertices in V L have assigned labels.
- the graph vertex classification problem solves: how to infer the label of each vertex in the set V ⁇ V L of the remaining vertices.
- a graph neural network usually consists of an input layer, one or more graph convolutional layers, and an output layer.
- graph neural networks can be divided into graph convolutional neural networks, graph recurrent neural networks, graph autoencoders, graph generative networks, and spatiotemporal graph neural networks.
- the graph convolutional neural network has attracted the attention of many researchers due to the great success of the traditional convolutional neural network in the fields of image processing and natural language understanding.
- Figure 1 shows the structure of a typical graph convolutional neural network, which consists of an input layer (Input layer), two graph convolution layers (Gconv layer), and an output layer (Output layer) composition.
- the input layer reads the n*d-dimensional vertex attribute matrix X;
- the graph convolution layer performs feature extraction on X, and passes it to the next graph convolution layer after nonlinear activation functions such as ReLu transformation;
- the output layer is the task Layer, to complete specific tasks such as vertex classification, clustering, etc.; the figure shows a vertex classification task layer, which outputs the category label Y of each vertex.
- the present application provides a model training solution that can combine supervised and unsupervised learning to effectively improve the accuracy of classification, effectively reduce the computational complexity of the network, and improve classification efficiency.
- model training method including:
- each vertex v of G has d features, and the features of all vertices constitute the n*d-dimensional vertex feature matrix X.
- the adjacency matrix of G is denoted as A, and the element A ij represents the weight of the connection edge between vertices i and j.
- an n*C-dimensional label matrix Y is constructed.
- n
- indicates the number of all vertices in the graph
- C indicates the number of label categories of all vertices
- the elements of each column corresponding to the row are set to 0.
- the Pubmed dataset contains 19,717 scientific publications in 3 categories with 44,338 citation links between publications. Publications and the links between them form a citation network, and each publication in the network uses a term frequency-inverse text frequency index (Term Frequency-Inverse Document Frequency, TF-IDF) vector to describe the feature vector, which has 500 from a dictionary of terms.
- the feature vectors of all documents form the feature matrix X.
- the goal is to classify each document, randomly sample 20 instances of each category as labeled data, use 1000 instances as test data, and use the rest as unlabeled data; construct a vertex label matrix Y. According to the citation relationship between papers, construct its adjacency matrix A.
- graph datasets can also be constructed based on proteins, graph images, etc. to classify proteins, graph images, etc.
- the adjacency matrix A based on the random walk and random sampling techniques, the positive point-wise mutual information matrix of the globally consistent information of the coding graph can be constructed.
- the adjacency matrix has two functions in random walk engineering. First, it represents the topological structure of the graph. According to it, it can be known which vertices are connected and can walk from one vertex to adjacent vertices; second. , is used to determine the probability of random walk, see formula (1) for details, a vertex may have multiple neighbors, in a random walk step, the walker can randomly pick one among all its neighbors.
- random walk and sampling are performed based on the adjacency matrix to obtain a positive point-wise mutual information matrix, including: based on the adjacency matrix, a random walk of a preset length is performed on each vertex in the graph data set to obtain Context path for each vertex; randomly sample all context paths to determine the co-occurrence times of any two vertices, and construct a vertex co-occurrence matrix; based on the vertex co-occurrence matrix, calculate the vertex and context co-occurrence probability and corresponding The marginal probability of , and determine each element in the positive pointwise mutual information matrix.
- the "co-occurrence probability of a vertex and a context” refers to: the probability pr(v i , ct j ) of a certain vertex v i appearing in a certain context ct j .
- the probability pr(v i , ct j ) of vertex v i is included in ct j .
- the marginal probability of vertex v i is equal to the sum of elements in row i in this matrix divided by the sum of all elements in this matrix.
- the marginal probability of context ct j is equal to the sum of elements in column j divided by the sum of all elements in this matrix.
- the positive point-wise mutual information matrix can be represented by P, which can encode the global consistency information of the graph, and can be determined by referring to the following content:
- the row vector pi, : is the embedded representation of the vertex v i
- the column vector p :, j is the embedded representation of the context ct j
- pi j represents the probability that the vertex v i appears in the context ct j
- the pointwise The mutual information matrix P can be obtained by random walk on the graph dataset. Specifically, consider the context ct j of vertex v j as a path ⁇ j with v j as the root node and length u, then p ij can be obtained by calculating the frequency of vertex v i appearing on the path ⁇ j .
- each vertex in the graph data set is randomly walked with a length of u steps, and the path ⁇ representing the context of the vertex can be obtained.
- Random sampling is performed on ⁇ to calculate the number of co-occurrences of any two vertices, and the vertex is obtained - context co-occurrence times matrix O (ie vertex co-occurrence times matrix).
- the element o ij represents the number of times that vertex v i appears on the context ct j , that is, the path ⁇ j with vertex v j as the root node, which can be used for subsequent calculation of p ij .
- the value of each element in the positive point-wise mutual information matrix P can be determined, thereby determining the positive point-wise mutual information matrix P.
- the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are identical, and both include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used for Perform feature transformation and graph convolution operations on the input data;
- the feature transformation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
- the graph convolution operation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
- Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation
- Hl is the input data of the lth convolutional layer of the graph convolutional neural network
- Hl+1 is the graph
- the output data of the first convolutional layer of the convolutional neural network is the transposition matrix of the feature transformation matrix that needs to be learned by the first graph convolutional layer of the graph convolutional neural network
- ⁇ is the nonlinear activation function
- K ⁇ n is the order of the polynomial
- n is the number of vertices in the graph dataset Number
- ⁇ k is the coefficient of polynomial
- T k (x) 2xT k-1 (x)-T k-2 (x)
- ⁇ max is The largest eigenvalue in , I n is an n*n-dimensional identity matrix.
- calculating the first loss value between the first training result and the label matrix includes: based on the cross-entropy principle, using the difference degree of the probability distribution between the first training result and the label matrix as the first loss value (i.e. supervised loss).
- calculating the second loss value between the second training result and the first training result includes: calculating the difference between elements with the same coordinates in the second training result and the first training result, and The sum of squares of all differences is used as the second loss value (i.e. unsupervised loss).
- the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, And perform iterative training on the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition.
- updating the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the purpose loss value includes: updating the first Chebyshev graph convolutional neural network according to the purpose loss value After network parameters, share the updated network parameters to the second Chebyshev graph convolutional neural network; or update the network parameters of the second Chebyshev graph convolutional neural network according to the target loss value, and update the updated The network parameters are shared to the first Chebyshev graph convolutional neural network; or after the new network parameters are calculated according to the target loss value, the new network parameters are shared to the first Chebyshev graph convolutional neural network and the second Chebyshev graph Convolutional neural network.
- the first Chebyshev graph convolutional neural network performs supervised training based on the vertex feature matrix, adjacency matrix, and label matrix
- the second Chebyshev graph convolutional neural network performs unsupervised training based on the vertex feature matrix, the positive point-wise mutual information matrix and the output of the first Chebyshev graph convolutional neural network during the training process; when the target loss determined based on the loss values of the two
- the two Chebyshev graph convolutional neural networks are combined into a dual vertex classification model, and a vertex classification model with better performance is trained.
- This scheme can give full play to the respective advantages of supervised training and unsupervised training, and improves the performance of the vertex classification model.
- the dual vertex classification model can also be called a dual Chebyshev graph convolutional neural network (DCGCN, Dual Chebyshev Graph Convolutional Neural Network).
- DCGCN Dual Chebyshev graph convolutional neural network
- the dual Chebyshev graph convolutional neural network includes two identical Chebyshev graph convolutional neural networks ChebyNet with shared parameters, and each ChebyNet consists of an input layer, L graph convolutional layers and an output layer.
- ChebyNet A takes the adjacency matrix A and vertex feature matrix X of encoding graph local consistency information as input data, and outputs the vertex category label prediction matrix Z A ;
- ChebyNet P uses the positive point-wise mutual information matrix P and vertex feature encoding graph global consistency information The feature matrix X is used as input data, and the vertex category label prediction matrix Z P is output.
- ChebyNet A performs supervised learning based on some labeled graph vertices, and the prediction accuracy is high; under the guidance of the former (using its prediction result Z A ), ChebyNet P uses unlabeled graph vertices for unsupervised learning to improve Prediction accuracy for better vertex classification models.
- Z A and Z P are consistent or the difference is negligible, so Z A or Z P can be used as the output of the dual Chebyshev graph convolutional neural network.
- Figure 4 illustrates the structure of a dual Chebyshev graph convolutional neural network.
- the convolutional layer in Figure 4 is the graph convolutional layer described below.
- the input layer is mainly responsible for reading the graph data to be classified, including the vertex feature matrix X, the adjacency matrix A representing the topology of the graph, and the positive point-by-point mutual information matrix P that encodes the global consistency information of the graph.
- the graph convolution operation formula is:
- Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation
- Hl is the input data of the lth convolutional layer of the graph convolutional neural network
- Hl+1 is the graph
- the output data of the first convolutional layer of the convolutional neural network is the transposition matrix of the feature transformation matrix that needs to be learned by the first graph convolutional layer of the graph convolutional neural network
- ⁇ is the nonlinear activation function
- K ⁇ n is the order of the polynomial
- n is the number of vertices in the graph dataset Number
- ⁇ k is the coefficient of polynomial
- T k (x) 2xT k-1 (x)-T k-2 (x)
- H1 is the vertex feature matrix X.
- ⁇ max is The largest eigenvalue in , I n is an n*n-dimensional identity matrix.
- U is given by the Laplacian matrix of the graph G A matrix composed of eigenvectors obtained by eigendecomposition; U -1 is the inverse matrix of U; ⁇ is a diagonal matrix of eigenvalues, and the elements on the diagonal are ⁇ 1 , ⁇ 2 ,..., ⁇ n .
- ⁇ k represents the order of the polynomial, which can limit the information to propagate at most K steps at each vertex. Therefore, only K+1 parameters are required, which greatly reduces the complexity of the model training process. due to the formula The calculation of the convolution kernel matrix involves the eigendecomposition of the graph Laplacian matrix, which is computationally expensive. Therefore, on this basis, the present embodiment uses the Chebyshev polynomials to design an approximate calculation scheme, and Approximately:
- the loss function of the dual Chebyshev graph convolutional neural network consists of two parts: the supervised learning loss ls S with labeled vertices and the unsupervised learning loss ls U for unlabeled vertices.
- ChebyNet A takes the adjacency matrix A and the vertex feature matrix X as input for supervised learning, and compares the vertex label prediction result Z A with the known vertex label matrix Y to calculate the supervised learning loss.
- ChebyNet P takes the positive point-wise mutual information matrix and vertex feature matrix X as input for unsupervised learning, and compares its prediction result Z P with ChebyNet A 's prediction result Z A to calculate the unsupervised learning loss.
- the loss function of the dual Chebyshev graph convolutional neural network can be expressed as: Among them, ⁇ is a constant used to adjust the proportion of unsupervised learning loss in the entire loss function.
- the supervised learning loss function calculates the degree of difference between the actual label probability distribution and the predicted label probability distribution of the vertex based on the principle of cross entropy; the unsupervised learning loss function calculates the sum of squares of the difference between the same coordinate elements of Z P and Z A.
- the initialization strategy of network parameters can choose normal distribution random initialization, Xavier initialization or He Initialization initialization, etc.
- Network parameters include feature transformation matrix ⁇ l and convolution kernel F l .
- the network parameters can be corrected and updated according to stochastic gradient descent (StochasticGradientDescent, SGD), momentum gradient descent (MomentumGradientDescent, MGD), NesterovMomentum, AdaGrad, RMSprop and Adam (AdaptiveMomentEstimation) or batch gradient descent (BatchGradientDescent, BGD), etc., to Optimize the loss function value.
- stochastic gradient descent StochasticGradientDescent, SGD
- momentum gradient descent MomentumGradientDescent, MGD
- NesterovMomentum AdaGrad
- AdaGrad AdaGrad
- RMSprop and Adam AdaptiveMomentEstimation
- BGD batch gradient descent
- the training process of the dual Chebyshev graph convolutional neural network can be carried out with reference to Figure 5, specifically including: for the graph data set G, construct the vertex feature matrix X , the positive point-by-point mutual information matrix P of the global consistency information of the encoded graph, the adjacency matrix A of the local consistency information of the encoded graph, and the vertex label matrix Y; the vertex feature matrix X and the adjacency matrix A are input into ChebyNet A , and the positive point-by-point mutual information The information matrix P and vertex feature matrix X are input into ChebyNet P , and the network parameters are updated according to the above loss function to train ChebyNet A and ChebyNet P.
- the training ends and a dual Chebyshev graph convolutional neural network is obtained.
- the class j it should belong to can be obtained according to the vertex label matrix Y.
- the output feature matrix of each layer is calculated; according to the definition of the output layer, the probability Z j (1 ⁇ j ⁇ C), and calculate the loss function value according to the loss function defined above; for an unlabeled vertex v i ⁇ V U , take the category with the highest probability as the latest category of the vertex to update the vertex label matrix Y.
- the dual Chebyshev graph convolutional neural network is composed of two Chebyshev graph convolutional neural networks with the same structure and shared parameters.
- the two perform supervised learning and unsupervised learning respectively, which can improve the network
- the convergence rate and prediction accuracy at the same time, the graph convolution layer is defined based on the graph Fourier transform, and the graph convolution operation is divided into two stages of feature transformation and graph convolution, which can reduce the amount of network parameters;
- the graph convolution kernel is defined as a polynomial convolution kernel, which ensures the locality of the graph convolution calculation; in order to reduce the computational complexity, the Chebyshev polynomial is used to approximate the graph convolution.
- this embodiment provides a training method for a dual Chebyshev graph convolutional neural network, which can solve the problem of vertex classification.
- graph modeling is performed on the collected data set to obtain its adjacency matrix and vertex feature matrix; based on the adjacency matrix, for each vertex, a random walk of a specific length is carried out on the graph, and the resulting walk is
- Sequence sampling obtains a positive point-by-point mutual information matrix, which represents the context information of vertices; defines the convolution operation according to the spectral graph theory, constructs the graph convolution layer for feature extraction and the output layer for vertex classification tasks, builds and trains Chebyshev graph convolutional neural network; at the end of training, classification predictions for unlabeled vertices in the graph are available.
- this method can learn more graph topology information, including the local consistency and global consistency of each vertex, due to the design strategy of the dual graph convolutional neural network.
- the characteristic information greatly improves the learning ability of the model; and, at the same time, using the graph topology and attribute characteristics of vertices, combined with supervised and unsupervised learning, effectively improves the accuracy of classification; with the help of Chebyshev polynomials to approximate the calculation of graph convolution, Avoiding the expensive matrix eigendecomposition operation effectively reduces the computational complexity of the network and improves the classification efficiency of the network.
- a model training device provided in the embodiment of the present application is introduced below, and a model training device described below and a model training method described above may refer to each other.
- model training device including:
- Obtaining module 601 used to obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set;
- the sampling module 602 is used to perform random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix;
- the first training module 603 is used to input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;
- the second training module 604 is used to input the vertex feature matrix and the positive point-by-point mutual information matrix into the second Chebyshev graph convolutional neural network to output the second training result;
- the first calculation module 605 is used to calculate the first loss value between the first training result and the label matrix
- a second calculation module 606, configured to calculate a second loss value between the second training result and the first training result
- a determining module 607 configured to determine a target loss value based on the first loss value and the second loss value
- the combination module 608 is configured to combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model if the target loss value meets the preset convergence condition.
- sampling module is specifically used for:
- a random walk of preset length is performed on each vertex in the graph dataset to obtain the context path of each vertex;
- the vertex and context co-occurrence probability and the corresponding edge probability are calculated, and each element in the positive point-wise mutual information matrix is determined.
- the first calculation module is specifically used for:
- the degree of difference in probability distribution between the first training result and the label matrix is used as the first loss value.
- the second calculation module is specifically used for:
- the determination module is specifically used for:
- the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, And perform iterative training on the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition;
- the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, including:
- the updated network parameters are shared to the second Chebyshev graph convolutional neural network;
- the updated network parameters are shared to the first Chebyshev graph convolutional neural network;
- the new network parameters are shared to the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network.
- both the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used to process the input data Perform feature transformation and graph convolution operations;
- the feature transformation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
- the graph convolution operation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
- Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation
- Hl is the input data of the lth convolutional layer of the graph convolutional neural network
- Hl+1 is the graph
- the output data of the first convolutional layer of the convolutional neural network is the transposition matrix of the feature transformation matrix that needs to be learned by the first graph convolutional layer of the graph convolutional neural network
- ⁇ is the nonlinear activation function
- K ⁇ n is the order of the polynomial
- n is the number of vertices in the graph dataset Number
- ⁇ k is the coefficient of polynomial
- T k (x) 2xT k-1 (x)-T k-2 (x)
- this embodiment provides a model training device, which can give full play to the respective advantages of supervised training and unsupervised training, and improve the performance of the vertex classification model.
- model training device provided in the embodiment of the present application, and the model training device described below and the model training method and device described above may refer to each other.
- model training device including:
- Memory 701 used to store computer programs
- the processor 702 is configured to execute the computer program, so as to implement the method disclosed in any of the foregoing embodiments.
- a readable storage medium provided by an embodiment of the present application is introduced below.
- the readable storage medium described below and the model training method, device, and equipment described above may refer to each other.
- a readable storage medium is used to store a computer program, wherein the computer program implements the model training method disclosed in the foregoing embodiments when executed by a processor.
- the computer program implements the model training method disclosed in the foregoing embodiments when executed by a processor.
- the specific steps of the method reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.
- RAM random access memory
- ROM read-only memory
- EEPROM electrically programmable ROM
- EEPROM electrically erasable programmable ROM
- registers hard disk, removable disk, CD-ROM, or any other Any other known readable storage medium.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
A model training method, apparatus and device, and a readable storage medium. By means of the method, two Chebyshev graph convolutional neural networks are designed, one of which performs supervised training on the basis of a vertex feature matrix, an adjacency matrix and a label matrix, and the other of which performs unsupervised training on the basis of the vertex feature matrix, a positive pointwise mutual information matrix, and an output of the previous network during the training process; and when a target loss value determined on the basis of loss values of the two Chebyshev graph convolutional neural networks meets a preset convergence condition, the two Chebyshev graph convolutional neural networks are combined into a dual vertex classification model to obtain a vertex classification model with better performance by means of training. By means of the method, respective advantages of supervised training and unsupervised training can be brought into full play, thereby improving the performance of a vertex classification model.
Description
本申请要求在2021年7月21日提交中国专利局、申请号为202110825194.9、发明名称为“一种模型训练方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on July 21, 2021, with the application number 202110825194.9, and the title of the invention is "a model training method, device, equipment, and readable storage medium", the entire content of which Incorporated in this application by reference.
本申请涉及计算机技术领域,特别涉及一种模型训练方法、装置、设备及可读存储介质。The present application relates to the field of computer technology, in particular to a model training method, device, equipment and readable storage medium.
随着云计算、物联网、移动通信和智能终端等信息技术的快速发展,以社交网络、社区和博客为代表的新型应用得到广泛使用。这些应用不断产生大量数据,方便用图来建模分析。其中,图的顶点表示个人或团体,连接边表示他们之间的联系;顶点上通常附有标签信息,用以表示所建模对象的年龄、性别、位置、兴趣爱好和宗教信仰,以及其他许多可能的特征。这些特征从各个方面反映了个人的行为偏好,理想情况下,每个社交网络用户都附有所有与自己特征相关的标签。但现实情况却并非如此。这是因为,用户出于保护个人隐私的目的,越来越多的社交网络用户在分享个人信息时,显得更加谨慎,导致社交网络媒体仅能搜集用户的部分信息。因此,如何根据已知用户的标签信息,推测剩余用户的标签,显得尤为重要和迫切。该问题即顶点分类问题。With the rapid development of information technologies such as cloud computing, the Internet of Things, mobile communications, and smart terminals, new applications represented by social networks, communities, and blogs are widely used. These applications continue to generate a large amount of data, which is convenient for modeling and analysis with graphs. Among them, the vertices of the graph represent individuals or groups, and the connecting edges represent the connections between them; the vertices are usually attached with label information to represent the age, gender, location, hobbies and religious beliefs of the modeled objects, and many others possible features. These characteristics reflect individual behavior preferences from various aspects. Ideally, each social network user has all tags related to his own characteristics. But the reality is not the case. This is because, for the purpose of protecting personal privacy, more and more social network users are more cautious when sharing personal information, so that social network media can only collect part of the user's information. Therefore, how to infer the tags of the remaining users based on the tag information of known users is particularly important and urgent. This problem is the vertex classification problem.
针对传统机器学习方法难以处理图数据的不足,学术界和工业界逐渐兴起一股图神经网络的研究热潮。图神经网络,简单地说,是一种用于图结构数据的深度学习架构,它将端到端学习与归纳推理相结合,有望解决传统深度学习架构无法处理的因果推理、可解释性等一系列瓶颈问题。Aiming at the inadequacy of traditional machine learning methods to deal with graph data, a wave of research on graph neural networks has gradually emerged in academia and industry. Graph neural network, simply put, is a deep learning architecture for graph-structured data, which combines end-to-end learning with inductive reasoning, and is expected to solve problems such as causal reasoning and interpretability that traditional deep learning architectures cannot handle. series of bottlenecks.
根据实现原理的不同,图卷积神经网络可分为基于空间方法的和基于谱方法的两种类型。其中,前者利用图上显示的信息传播机制,缺乏可解释性;后者以图的拉普拉斯矩阵为工具,具有良好的理论基础,是图卷积神经网络研究的主流方向。但是,目前基于谱方法的图卷积神经网络在应用图顶点分类任务时,表现并不理想,即现有的基于图卷积神经网络的顶点分类模型性能不佳。According to different implementation principles, graph convolutional neural networks can be divided into two types based on spatial methods and spectral methods. Among them, the former uses the information propagation mechanism displayed on the graph, which lacks interpretability; the latter uses the Laplacian matrix of the graph as a tool, has a good theoretical basis, and is the mainstream direction of graph convolutional neural network research. However, the current graph convolutional neural networks based on spectral methods do not perform well when applying graph vertex classification tasks, that is, the existing graph convolutional neural network-based vertex classification models perform poorly.
因此,如何提高顶点分类模型的性能,是本领域技术人员需要解决的问题。Therefore, how to improve the performance of the vertex classification model is a problem to be solved by those skilled in the art.
发明内容Contents of the invention
有鉴于此,本申请的目的在于提供一种模型训练方法、装置、设备及可读存储介质,以提高顶点分类模型的性能。其具体方案如下:In view of this, the purpose of the present application is to provide a model training method, device, device and readable storage medium to improve the performance of the vertex classification model. The specific plan is as follows:
第一方面,本申请提供了一种模型训练方法,包括:In a first aspect, the present application provides a model training method, including:
获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵;Obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph dataset;
基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵;performing random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix;
将所述顶点特征矩阵和所述邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果;Input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;
将所述顶点特征矩阵和所述正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果;Inputting the vertex feature matrix and the positive point-wise mutual information matrix into a second Chebyshev graph convolutional neural network to output a second training result;
计算所述第一训练结果和所述标签矩阵之间的第一损失值;calculating a first loss value between the first training result and the label matrix;
计算所述第二训练结果和所述第一训练结果之间的第二损失值;calculating a second loss value between the second training result and the first training result;
基于所述第一损失值和所述第二损失值确定目的损失值;determining a target loss value based on the first loss value and the second loss value;
若所述目的损失值符合预设收敛条件,则将所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。If the target loss value meets the preset convergence condition, combining the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model.
优选地,所述基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵,包括:Preferably, the random walk and sampling are performed based on the adjacency matrix to obtain a positive point-wise mutual information matrix, including:
基于所述邻接矩阵,对所述图数据集中的每个顶点进行预设长度的随机游走,得到每个顶点的上下文路径;Based on the adjacency matrix, a random walk of a preset length is performed on each vertex in the graph data set to obtain a context path of each vertex;
对所有上下文路径进行随机采样,以确定任意两个顶点的共现次数,并构建顶点共现次数矩阵;Randomly sample all context paths to determine the number of co-occurrences of any two vertices and build a matrix of vertex co-occurrences;
基于顶点共现次数矩阵,计算顶点与上下文共现概率和相应的边缘概率,并确定所述正逐点互信息矩阵中的每个元素。Based on the vertex co-occurrence times matrix, the co-occurrence probability of the vertex and the context and the corresponding edge probability are calculated, and each element in the positive point-wise mutual information matrix is determined.
优选地,所述计算所述第一训练结果和所述标签矩阵之间的第一损失值,包括:Preferably, the calculating the first loss value between the first training result and the label matrix includes:
基于交叉熵原理,将所述第一训练结果和所述标签矩阵之间的概率分布差异程度作为所述第一损失值。Based on the cross-entropy principle, the degree of difference in probability distribution between the first training result and the label matrix is used as the first loss value.
优选地,所述计算所述第二训练结果和所述第一训练结果之间的第二损失值,包括:Preferably, said calculating a second loss value between said second training result and said first training result includes:
计算所述第二训练结果和所述第一训练结果中具有相同坐标的元素的差值,并将所有差值的平方和作为所述第二损失值。calculating the difference between elements having the same coordinates in the second training result and the first training result, and using the sum of squares of all differences as the second loss value.
优选地,所述基于所述第一损失值和所述第二损失值确定目的损失值,包括:Preferably, the determining the target loss value based on the first loss value and the second loss value includes:
将所述第一损失值和所述第二损失值输入损失函数,以输出所述目的损失值;inputting the first loss value and the second loss value into a loss function to output the target loss value;
其中,所述损失函数为:ls=ls
S+αls
U,ls为所述目的损失值,ls
S为所述第一损失值,ls
U为所述第二损失值,α为调节第二损失值在目的损失值中所占比例的常数。
Wherein, the loss function is: ls=ls S +αls U , ls is the target loss value, ls S is the first loss value, ls U is the second loss value, and α is the adjusted second loss value Constant for the proportion of the value in the destination loss value.
优选地,若所述目的损失值不符合预设收敛条件,则根据所述目的损失值更新所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络的网络参数,并对更新后的第一切比雪夫图卷积神经网络和更新后的第二切比雪夫图卷积神经网络进行迭代训练,直至所述目的损失值符合预设收敛条件;Preferably, if the target loss value does not meet the preset convergence condition, updating the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the target loss value Network parameters, and iteratively training the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition;
其中,所述根据所述目的损失值更新所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络的网络参数,包括:Wherein, the updating the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the purpose loss value includes:
根据所述目的损失值更新所述第一切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至所述第二切比雪夫图卷积神经网络;After updating the network parameters of the first Chebyshev graph convolutional neural network according to the target loss value, sharing the updated network parameters to the second Chebyshev graph convolutional neural network;
或or
根据所述目的损失值更新所述第二切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至所述第一切比雪夫图卷积神经网络;After updating the network parameters of the second Chebyshev graph convolutional neural network according to the target loss value, sharing the updated network parameters to the first Chebyshev graph convolutional neural network;
或or
根据所述目的损失值计算得到新网络参数后,将所述新网络参数共享至所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络。After the new network parameters are calculated according to the target loss value, the new network parameters are shared with the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network.
优选地,所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络均包括L层图卷积层,该L层图卷积层用于对输入数据进行特征变换和图卷积操作;Preferably, both the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used to process the input data Feature transformation and graph convolution operations;
其中,第l(1≤l≤L)层图卷积层的特征变换公式为:
第l(1≤l≤L)层图卷积层的图卷积操作公式为:
Among them, the feature transformation formula of the lth (1≤l≤L) layer graph convolution layer is: The graph convolution operation formula of the lth (1≤l≤L) layer graph convolution layer is:
其中,Q
l为图卷积神经网络第l图卷积层经特征变换后的顶点特征矩阵;H
l为图卷积神经网络的第l图卷积层的输入数据,H
l+1为图卷积神经网络的第l图卷积层的输出数据;
是图卷积神经网络的第l图卷积层需学习的特征变换矩阵的转置矩阵;σ为非线性激活函数;K<<n,为多项式的阶数;n为所述图数据集中的顶点个数;θ
k是多项式的系数;T
k(x)=2xT
k-1(x)-T
k-2(x),且T
0=1,T
1=x为切比雪夫多项式;
为所述图数据集的拉普拉斯矩阵,
为经过线性变换后的拉普拉斯矩阵。
Among them, Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation; Hl is the input data of the lth convolutional layer of the graph convolutional neural network, and Hl+1 is the graph The output data of the first convolutional layer of the convolutional neural network; is the transposition matrix of the feature transformation matrix that needs to be learned by the first convolutional layer of the graph convolutional neural network; σ is a nonlinear activation function; K<<n is the order of the polynomial; n is the The number of vertices; θ k is the coefficient of the polynomial; T k (x)=2xT k-1 (x)-T k-2 (x), and T 0 =1, T 1 =x is the Chebyshev polynomial; is the Laplacian matrix of the graph dataset, is the Laplacian matrix after linear transformation.
第二方面,本申请提供了一种模型训练装置,包括:In a second aspect, the present application provides a model training device, including:
获取模块,用于获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵;The obtaining module is used to obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set;
采样模块,用于基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵;A sampling module, configured to perform random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix;
第一训练模块,用于将所述顶点特征矩阵和所述邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果;The first training module is used to input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;
第二训练模块,用于将所述顶点特征矩阵和所述正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果;The second training module is used to input the vertex feature matrix and the positive point-wise mutual information matrix into the second Chebyshev graph convolutional neural network to output the second training result;
第一计算模块,用于计算所述第一训练结果和所述标签矩阵之间的第一损失值;a first calculation module, configured to calculate a first loss value between the first training result and the label matrix;
第二计算模块,用于计算所述第二训练结果和所述第一训练结果之间的第二损失值;a second calculation module, configured to calculate a second loss value between the second training result and the first training result;
确定模块,用于基于所述第一损失值和所述第二损失值确定目的损失值;a determining module, configured to determine a target loss value based on the first loss value and the second loss value;
组合模块,用于若所述目的损失值符合预设收敛条件,则将所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。A combination module, configured to combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model if the target loss value meets a preset convergence condition .
第三方面,本申请提供了一种模型训练设备,包括:In a third aspect, the present application provides a model training device, including:
存储器,用于存储计算机程序;memory for storing computer programs;
处理器,用于执行所述计算机程序,以实现前述公开的模型训练方法。A processor is configured to execute the computer program to implement the model training method disclosed above.
第四方面,本申请提供了一种可读存储介质,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现前述公开的模型训练方法。In a fourth aspect, the present application provides a readable storage medium for storing a computer program, wherein when the computer program is executed by a processor, the aforementioned disclosed model training method is implemented.
通过以上方案可知,本申请提供了一种模型训练方法,包括:获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵;基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵;将所述顶点特征矩阵和所述邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果;将所述顶点特征矩阵和所述正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果;计算所述第一训练结果和所述标签矩阵之间的第一损失值;计算所述第二训练结果和所述第一训练结果之间的第二损失值;基于所述第一损失值和所述第二损失值确定目的损失值;若所述目的损失值符合预设收敛条件,则将所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。It can be seen from the above scheme that the present application provides a model training method, including: obtaining the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set; performing random walk and sampling based on the adjacency matrix to obtain the positive point-by-point Mutual information matrix; Input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result; Input the vertex feature matrix and the positive point-by-point mutual information matrix A second Chebyshev graph convolutional neural network to output a second training result; calculate a first loss value between the first training result and the label matrix; calculate the second training result and the first the second loss value between the training results; determine the target loss value based on the first loss value and the second loss value; if the target loss value meets the preset convergence condition, the first Chebyshev The graph convolutional neural network and the second Chebyshev graph convolutional neural network are combined into a dual vertex classification model.
可见,本申请设计了两个切比雪夫图卷积神经网络,第一切比雪夫图卷积神经网络基于顶点特征矩阵、邻接矩阵、标签矩阵进行有监督训练,同时第二切比雪夫图卷积神经网络基于顶点特征矩阵、正逐点互信息矩阵和第一切比雪夫图卷积神经网络在训练过程中的输出,进行无监督训练;当基于二者的损失值所确定的目的损失值符合预设收敛条件时,将两个切比雪夫图卷积神经网络组合为对偶顶点分类模 型,从而训练得到了性能更佳的顶点分类模型。该方案能够充分发挥有监督训练和无监督训练各自的优势,提升了顶点分类模型的性能。It can be seen that this application designs two Chebyshev graph convolutional neural networks, the first Chebyshev graph convolutional neural network is based on vertex feature matrix, adjacency matrix, and label matrix for supervised training, while the second Chebyshev graph convolutional neural network The product neural network performs unsupervised training based on the vertex feature matrix, the positive point-wise mutual information matrix and the output of the first Chebyshev graph convolutional neural network during the training process; when the target loss value determined based on the loss value of the two When the preset convergence conditions are met, the two Chebyshev graph convolutional neural networks are combined into a dual vertex classification model to train a vertex classification model with better performance. This scheme can give full play to the respective advantages of supervised training and unsupervised training, and improve the performance of the vertex classification model.
相应地,本申请提供的一种模型训练装置、设备及可读存储介质,也同样具有上述技术效果。Correspondingly, the model training device, equipment and readable storage medium provided by the present application also have the above-mentioned technical effects.
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only It is an embodiment of the present application, and those skilled in the art can also obtain other drawings according to the provided drawings without creative work.
图1为本申请公开的一种图卷积神经网络的结构示意图;Fig. 1 is a schematic structural diagram of a graph convolutional neural network disclosed in the present application;
图2为本申请公开的一种模型训练方法流程图;Fig. 2 is a flow chart of a model training method disclosed in the present application;
图3为本申请公开的一种对偶切比雪夫图卷积神经网络的数据走向示意图;Fig. 3 is a schematic diagram of the data trend of a dual Chebyshev graph convolutional neural network disclosed in the present application;
图4为本申请公开的一种对偶切比雪夫图卷积神经网络示意图;4 is a schematic diagram of a dual Chebyshev graph convolutional neural network disclosed in the present application;
图5为本申请公开的一种模型构建及训练方法流程图;5 is a flow chart of a model construction and training method disclosed in the present application;
图6为本申请公开的一种模型训练装置示意图;6 is a schematic diagram of a model training device disclosed in the present application;
图7为本申请公开的一种模型训练设备示意图。FIG. 7 is a schematic diagram of a model training device disclosed in the present application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
为方便理解本申请,先对图神经网络和图数据集进行介绍。In order to facilitate the understanding of this application, the graph neural network and graph dataset are introduced first.
需要说明的是,用图对数据及数据之间的关系进行建模分析,具有重要的学术和经济价值。例如,(1)研究传染性疾病和思想观点等在社交网络中随着时间传播扩散的规律;(2)研究社交网络中的群体如何围绕特定利益或隶属关系形成社团,以及社团连接的强度;(3)社交网络根据“人以群分”的规律,发现具有相似兴趣的人,向他们建议或推荐新的链接或联系;(4)问答系统将问题引导给最有相关经验的人;广告系统向最有兴趣并愿意接受特定主题广告的个人显示广告。It should be noted that using graphs to model and analyze data and the relationship between data has important academic and economic value. For example, (1) to study the law of infectious diseases and ideas spread over time in social networks; (2) to study how groups in social networks form communities around specific interests or affiliations, and the strength of community connections; (3) The social network discovers people with similar interests according to the law of "grouping people into groups", and suggests or recommends new links or connections to them; (4) The question answering system guides questions to the people with the most relevant experience; advertising Advertisements are shown to individuals who are most interested and willing to receive advertisements on a particular topic.
因此需要根据已知用户的标签信息,推测剩余用户的标签,该问题即顶点分类问题,它可形式化地描述为:给定一个图G=(V,E),V表示顶点集合,E表示连接边的集合,V
L是V的一个子集,V
L中的顶点有指定的标签。图顶点分类问题解决的是:如何推断剩余顶点构成的集合V\V
L中,每个顶点的标签。与传统分类问题不同,它不能直接应用传统机器学习中的分类方法,如支持向量机、k近邻、决策树和朴素贝叶斯,来解决。这是因为,传统分类方法通常假设对象是独立的,分类结果不精确。但在图顶点分类中,不同对象即顶点之间并非相互独立,相反,它们有着复杂的依赖关系,必须充分利用这些关系,来提高分类的质量。
Therefore, it is necessary to infer the labels of the remaining users based on the label information of the known users. This problem is the vertex classification problem, which can be formally described as: Given a graph G=(V,E), V represents the set of vertices, and E represents A collection of connected edges, V L is a subset of V, and the vertices in V L have assigned labels. The graph vertex classification problem solves: how to infer the label of each vertex in the set V\V L of the remaining vertices. Unlike traditional classification problems, it cannot be solved directly by applying classification methods in traditional machine learning, such as support vector machines, k-nearest neighbors, decision trees, and naive Bayes. This is because traditional classification methods usually assume that objects are independent, and the classification results are not precise. But in graph vertex classification, different objects, that is, vertices, are not independent of each other, on the contrary, they have complex dependencies, and these relationships must be fully utilized to improve the quality of classification.
图神经网络通常由输入层、一个或多个图卷积层,以及输出层组成。根据结构特点,图神经网络可分为图卷积神经网络、图递归神经网络、图自编码器、图生成网络和时空图神经网络。其中,图卷积神经网络由于传统的卷积神经网络在图像处理、自然语言理解等领域取得巨大成功而吸引众多学者的注意。A graph neural network usually consists of an input layer, one or more graph convolutional layers, and an output layer. According to the structural characteristics, graph neural networks can be divided into graph convolutional neural networks, graph recurrent neural networks, graph autoencoders, graph generative networks, and spatiotemporal graph neural networks. Among them, the graph convolutional neural network has attracted the attention of many scholars due to the great success of the traditional convolutional neural network in the fields of image processing and natural language understanding.
参见图1所示,图1展示了一个典型的图卷积神经网络的结构,它由一个输入层(Input layer)、两个图卷积层(Gconv layer),和一个输出层(Output layer)组成。其中,输入层读取n*d维的顶点属性矩阵X;图卷积层对X进行特征提取,经由非线性激活函数如ReLu变换后传递给下一个图卷积层;最后,输出层即任务层,完成特定的任务如顶点分类、聚类等;图中展示的是一个顶点分类任务层,输出每个顶点的类别标签Y。See Figure 1, Figure 1 shows the structure of a typical graph convolutional neural network, which consists of an input layer (Input layer), two graph convolution layers (Gconv layer), and an output layer (Output layer) composition. Among them, the input layer reads the n*d-dimensional vertex attribute matrix X; the graph convolution layer performs feature extraction on X, and passes it to the next graph convolution layer after nonlinear activation functions such as ReLu transformation; finally, the output layer is the task Layer, to complete specific tasks such as vertex classification, clustering, etc.; the figure shows a vertex classification task layer, which outputs the category label Y of each vertex.
但由于基于谱方法的图卷积神经网络在应用图顶点分类任务时,表现并不理想,其主要原因是:(1)拉普拉斯矩阵进行特征分解的计算开销较大,为O(n
3);(2)通过添加正则项定义的目标损失函数(ls=ls
s+αls
reg,ls
S和ls
reg分别表示有监督学习损失函数和基于图拓扑结构定义的正则项)依赖于“相邻顶点具有类似标签”的局部一致性假设,该假设会限制图神经网络模型的能力,因为图中的连接边并没有对节点间相似性进行编码,但其实它们可以包含附加信息的。
However, due to the graph convolutional neural network based on the spectral method, the performance is not ideal when applying the graph vertex classification task. The main reason is: (1) The calculation overhead of the eigendecomposition of the Laplacian matrix is relatively large, which is O(n 3 ); (2) The target loss function defined by adding a regular term (ls = ls s + α ls reg , ls S and ls reg denote the supervised learning loss function and the regular term defined based on the graph topology, respectively) depends on the "phase Neighboring vertices have similar labels" local consistency assumption, which limits the ability of graph neural network models, because the connecting edges in the graph do not encode the similarity between nodes, but they can contain additional information.
为此,本申请提供了一种模型训练方案,能够结合有监督和无监督学习,有效提高分类的准确度,并有效降低网络的计算复杂性,提高分类效率。To this end, the present application provides a model training solution that can combine supervised and unsupervised learning to effectively improve the accuracy of classification, effectively reduce the computational complexity of the network, and improve classification efficiency.
参见图2所示,本申请实施例公开了一种模型训练方法,包括:Referring to Figure 2, the embodiment of the present application discloses a model training method, including:
S201、获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵。S201. Obtain a vertex feature matrix, an adjacency matrix, and a label matrix constructed based on the graph data set.
假设待分类的图数据集为G=(V,E),V表示顶点集合,它分为少量具有类别标签的顶点集合V
L和大部分无类别标签的顶点集合V
U两部分,并满足V
L∪V
U=V,
E表示连接边集合。除标签外,G的每个顶点v都拥有d个特征,所有顶点的特征构成了n*d维的顶点特征矩阵X。G的邻接矩阵记为A,元素A
ij表示顶点i和j之间的连接边的权重。
Assume that the graph data set to be classified is G=(V, E), and V represents the vertex set, which is divided into two parts: a small number of vertex sets V L with class labels and most of the vertex sets V U without class labels, and satisfy V L ∪ V U = V, E represents the set of connected edges. In addition to the label, each vertex v of G has d features, and the features of all vertices constitute the n*d-dimensional vertex feature matrix X. The adjacency matrix of G is denoted as A, and the element A ij represents the weight of the connection edge between vertices i and j.
根据已有标签的顶点集合V
L,构建n*C维的标签矩阵Y。其中,n=|V|表示图中所有顶点个数,C表示所有顶点的标签类别数,矩阵元素Y
ij表示顶点i的类别标签是否为j(j=1,2,…,C)。当顶点i已有类别标签时,置其第j列元素为1,其余列元素为0,即有:Y
ij=1(k=j时)或0(k≠j时)。当顶点i为无类别标签时,将该行对应的每一列元素都置为0。
According to the vertex set V L with existing labels, an n*C-dimensional label matrix Y is constructed. Among them, n=|V| indicates the number of all vertices in the graph, C indicates the number of label categories of all vertices, and matrix element Y ij indicates whether the category label of vertex i is j (j=1, 2, ..., C). When vertex i already has a class label, set the jth column element to 1, and the other column elements to 0, that is: Y ij =1 (when k=j) or 0 (when k≠j). When the vertex i has no category label, the elements of each column corresponding to the row are set to 0.
例如:基于Pubmed数据集构建图数据集。Pubmed数据集包含3个类别的19717种科学出版物,出版物之间含有44,338个引用链接。出版物及它们之间的链接形成引文网络,网络中的每个出版物都用词频-逆文本频率指数(Term Frequency-Inverse Document Frequency,TF-IDF)矢量描述特征向量,该矢量从具有500个术语的字典中得出。所有文档的特征向量组成特征矩阵X。目标是将每个文档归类,每个类别随机抽取20个实例作为标记数据,将1000个实例作为测试数据,其余用作未标记的数据;构建顶点标签矩阵Y。根据论文间的引用关系,构建其邻接矩阵A。根据A计算任意两个顶点间的转移概率;对每个顶点v
j开展长度为u的随机游走得到路径π
j;对π
j随机采样计算顶点v
i出现在路径π
j上的频率P
ij,进而得到正逐点互信息矩阵P。
For example: Building a graph dataset based on the Pubmed dataset. The Pubmed dataset contains 19,717 scientific publications in 3 categories with 44,338 citation links between publications. Publications and the links between them form a citation network, and each publication in the network uses a term frequency-inverse text frequency index (Term Frequency-Inverse Document Frequency, TF-IDF) vector to describe the feature vector, which has 500 from a dictionary of terms. The feature vectors of all documents form the feature matrix X. The goal is to classify each document, randomly sample 20 instances of each category as labeled data, use 1000 instances as test data, and use the rest as unlabeled data; construct a vertex label matrix Y. According to the citation relationship between papers, construct its adjacency matrix A. Calculate the transition probability between any two vertices according to A; conduct a random walk of length u for each vertex v j to obtain the path π j ; randomly sample π j to calculate the frequency P ij of vertex v i appearing on the path π j , and then get the positive point-wise mutual information matrix P.
当然,还可以基于蛋白质、图形图像等构建图数据集,以对蛋白质、图形图像等进行分类。Of course, graph datasets can also be constructed based on proteins, graph images, etc. to classify proteins, graph images, etc.
S202、基于邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵。S202. Perform random walk and sampling based on the adjacency matrix to obtain a positive point-wise mutual information matrix.
根据邻接矩阵A,基于随机游走和随机采样技术可以构造编码图全局一致信息的正逐点互信息矩阵。具体的,邻接矩阵在随机游走工程中有两种作用,第一,表征图拓扑结构,根据它可以知道哪些顶点之间有连接关系,可以从一个顶点游走到相邻的顶点;第二,用于确定随机游走的概率,详见公式(1),一个顶点可能有多个邻居,在一个随机游走步中,游走者可在它的所有邻居中随机挑一个。According to the adjacency matrix A, based on the random walk and random sampling techniques, the positive point-wise mutual information matrix of the globally consistent information of the coding graph can be constructed. Specifically, the adjacency matrix has two functions in random walk engineering. First, it represents the topological structure of the graph. According to it, it can be known which vertices are connected and can walk from one vertex to adjacent vertices; second. , is used to determine the probability of random walk, see formula (1) for details, a vertex may have multiple neighbors, in a random walk step, the walker can randomly pick one among all its neighbors.
在一种具体实施方式中,基于邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵,包括:基于邻接矩阵,对图数据集中的每个顶点进行预设长度的随机游走,得到每个顶点的上下文路径;对所有上下文路径进行随机采样,以确定任意两个顶点的共现次数,并构建顶点共现次数矩阵;基于顶点共现次数矩阵,计算顶点与上下文共现概率和相应的边缘概率,并确定正逐点互信息矩阵中的每个元素。In a specific implementation, random walk and sampling are performed based on the adjacency matrix to obtain a positive point-wise mutual information matrix, including: based on the adjacency matrix, a random walk of a preset length is performed on each vertex in the graph data set to obtain Context path for each vertex; randomly sample all context paths to determine the co-occurrence times of any two vertices, and construct a vertex co-occurrence matrix; based on the vertex co-occurrence matrix, calculate the vertex and context co-occurrence probability and corresponding The marginal probability of , and determine each element in the positive pointwise mutual information matrix.
其中,“顶点与上下文共现概率”是指:某个顶点v
i出现在某个上下文ct
j中的概率pr(v
i,ct
j)。或者说,ct
j中包含顶点v
i的概率pr(v
i,ct
j)。在得到所有的顶点与上下文共现概率后,它们组成了一个矩阵,即顶点共现次数矩阵。顶点v
i的边缘概率等于该矩阵中第i行元素的加和除以该矩阵中所有元素的加和。上下文ct
j的边缘概率等于第j列元素的加和除以该矩阵中所有元素的加和。
Wherein, the "co-occurrence probability of a vertex and a context" refers to: the probability pr(v i , ct j ) of a certain vertex v i appearing in a certain context ct j . In other words, the probability pr(v i , ct j ) of vertex v i is included in ct j . After getting all the co-occurrence probabilities of vertices and contexts, they form a matrix, namely the matrix of co-occurrence times of vertices. The marginal probability of vertex v i is equal to the sum of elements in row i in this matrix divided by the sum of all elements in this matrix. The marginal probability of context ct j is equal to the sum of elements in column j divided by the sum of all elements in this matrix.
正逐点互信息矩阵可以用P表示,其能够编码图全局一致性信息,具体可参照如下内容进行确定:The positive point-wise mutual information matrix can be represented by P, which can encode the global consistency information of the graph, and can be determined by referring to the following content:
假设行向量pi,
:是顶点v
i的嵌入式表示,列向量p
:,j是上下文ct
j的嵌入式表示,而pi
j表示顶点v
i出现在上下文ct
j中的概率,那么正逐点互信息矩阵P可通过对图数据集的随机游走获得。具体地说,将顶点v
j的上下文ct
j视为以v
j为根节点、长度为u的路径π
j,则p
ij可通过计算顶点v
i出现在路径π
j上的频率得到。不失一般性,设某随机游走者时刻τ所在的图顶点编号为x(τ),且x(τ)=v
i,则τ+1时刻游走到其邻居顶点v
j的概率t
ij用公式(1)表示为:t
ij=pr(x(τ+1)=v
j|x(τ)=v
i)=A
ij/∑
jA
ij。
Suppose the row vector pi, : is the embedded representation of the vertex v i , the column vector p :, j is the embedded representation of the context ct j , and pi j represents the probability that the vertex v i appears in the context ct j , then the pointwise The mutual information matrix P can be obtained by random walk on the graph dataset. Specifically, consider the context ct j of vertex v j as a path π j with v j as the root node and length u, then p ij can be obtained by calculating the frequency of vertex v i appearing on the path π j . Without loss of generality, suppose that the number of the graph vertex where a random walker is at time τ is x(τ), and x(τ)=v i , then the probability t ij of walking to its neighbor vertex v j at time τ+1 Expressed by formula (1): t ij =pr(x(τ+1)=v j |x(τ)=v i )=A ij /∑ j A ij .
按照公式(1)对图数据集中每个顶点开展长度为u步的随机游走,即可得到表征该顶点上下文的路径π,对π实施随机采样计算任意两个顶点的共现次数,得到顶点-上下文共现次数矩阵O(即顶点共现次数矩阵)。在该矩阵O中,元素o
ij表示顶点v
i出现在上下文ct
j即以顶点v
j为根节点的路径π
j上的次数,它可用于随后计算p
ij。基于顶点共现次数矩阵O计算顶点与上下文共现概率和相应的边缘概率。记顶点v
i和上下文ct
j的共现概率以及相应的边缘概率分别为pr(v
i,ct
j)、pr(v
i)和pr(ctj),则有公式(2):
According to the formula (1), each vertex in the graph data set is randomly walked with a length of u steps, and the path π representing the context of the vertex can be obtained. Random sampling is performed on π to calculate the number of co-occurrences of any two vertices, and the vertex is obtained - context co-occurrence times matrix O (ie vertex co-occurrence times matrix). In this matrix O, the element o ij represents the number of times that vertex v i appears on the context ct j , that is, the path π j with vertex v j as the root node, which can be used for subsequent calculation of p ij . Calculate the vertex and context co-occurrence probability and the corresponding edge probability based on the vertex co-occurrence times matrix O. Note that the co-occurrence probability of vertex v i and context ct j and the corresponding edge probability are pr(v i , ct j ), pr(v i ) and pr(ctj) respectively, then there is formula (2):
结合公式(2),则正逐点互信息矩阵P中元素P
ij的值可通过以下公式计算得到:p
ij=max(log(pr(v
i,ct
j)/(pr(v
i)pr(ct
j)),0)。
Combined with formula (2), the value of element P ij in the positive point-wise mutual information matrix P can be calculated by the following formula: p ij =max(log(pr(v i , ct j )/(pr(v i )pr (ct j )), 0).
据此即可确定正逐点互信息矩阵P中每个元素的值,从而确定正逐点互信息矩阵P。Based on this, the value of each element in the positive point-wise mutual information matrix P can be determined, thereby determining the positive point-wise mutual information matrix P.
S203、将顶点特征矩阵和邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果。S203. Input the vertex feature matrix and adjacency matrix into the first Chebyshev graph convolutional neural network to output a first training result.
S204、将顶点特征矩阵和正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果。S204. Input the vertex feature matrix and the positive point-wise mutual information matrix into the second Chebyshev graph convolutional neural network to output a second training result.
在一种具体实施方式中,第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络完全相同,均包括L层图卷积层,该L层图卷积层用于对输入数据进行特征变换和图卷积操作;In a specific implementation manner, the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are identical, and both include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used for Perform feature transformation and graph convolution operations on the input data;
其中,第l(1≤l≤L)层图卷积层的特征变换公式为:
第l(1≤l≤L)层图卷积层的图卷积操作公式为:
Among them, the feature transformation formula of the lth (1≤l≤L) layer graph convolution layer is: The graph convolution operation formula of the lth (1≤l≤L) layer graph convolution layer is:
其中,Q
l为图卷积神经网络第l图卷积层经特征变换后的顶点特征矩阵;H
l为图卷积神经网络的第l图卷积层的输入数据,H
l+1为图卷积神经网络的第l图卷积层的输出数据;
是图卷积神经网络的第l图卷积层需学习的特征变换矩阵的转置矩阵;σ为非线性激活函数;K<<n,为多项式的阶数;n为图数据集中的顶点个数;θ
k是多项式的系数;T
k(x)=2xT
k-1(x)-T
k-2(x),且T
0=1,T
1=x为切比雪夫多项式;
为图数据集的拉普拉斯矩阵,
为经过线性变换后的拉普拉斯矩阵。
Among them, Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation; Hl is the input data of the lth convolutional layer of the graph convolutional neural network, and Hl+1 is the graph The output data of the first convolutional layer of the convolutional neural network; is the transposition matrix of the feature transformation matrix that needs to be learned by the first graph convolutional layer of the graph convolutional neural network; σ is the nonlinear activation function; K<<n is the order of the polynomial; n is the number of vertices in the graph dataset Number; θ k is the coefficient of polynomial; T k (x)=2xT k-1 (x)-T k-2 (x), and T 0 =1, T 1 =x is Chebyshev polynomial; is the Laplacian matrix of the graph dataset, is the Laplacian matrix after linear transformation.
其中,
λ
max为
中最大的特征值,I
n为n*n维的恒等矩阵。
in, λ max is The largest eigenvalue in , I n is an n*n-dimensional identity matrix.
S205、计算第一训练结果和标签矩阵之间的第一损失值。S205. Calculate a first loss value between the first training result and the label matrix.
在一种具体实施方式中,计算第一训练结果和标签矩阵之间的第一损失值,包括:基于交叉熵原理,将第一训练结果和标签矩阵之间的概率分布差异程度作为第一损失值(即有监督损失)。In a specific implementation manner, calculating the first loss value between the first training result and the label matrix includes: based on the cross-entropy principle, using the difference degree of the probability distribution between the first training result and the label matrix as the first loss value (i.e. supervised loss).
S206、计算第二训练结果和第一训练结果之间的第二损失值。S206. Calculate a second loss value between the second training result and the first training result.
在一种具体实施方式中,计算第二训练结果和第一训练结果之间的第二损失值,包括:计算第二训练结果和第一训练结果中具有相同坐标的元素的差值,并将所有差值的平方和作为第二损失值(即无监督损失)。In a specific implementation manner, calculating the second loss value between the second training result and the first training result includes: calculating the difference between elements with the same coordinates in the second training result and the first training result, and The sum of squares of all differences is used as the second loss value (i.e. unsupervised loss).
S207、基于第一损失值和第二损失值确定目的损失值。S207. Determine a target loss value based on the first loss value and the second loss value.
在一种具体实施方式中,基于第一损失值和第二损失值确定目的损失值,包括:将第一损失值和第二损失值输入损失函数,以输出目的损失值;其中,损失函数为:ls=ls
S+αls
U,ls为目的损失值,ls
S为第一损失值,ls
U为第二损失值,α为调节第二损失值在目的损失值中所占比例的常数。
In a specific implementation manner, determining the target loss value based on the first loss value and the second loss value includes: inputting the first loss value and the second loss value into a loss function to output a target loss value; wherein, the loss function is : ls=ls S +αls U , ls is the target loss value, ls S is the first loss value, ls U is the second loss value, and α is a constant for adjusting the proportion of the second loss value in the target loss value.
S208、若目的损失值符合预设收敛条件,则将第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。S208. If the target loss value meets the preset convergence condition, combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model.
在一种具体实施方式中,若目的损失值不符合预设收敛条件,则根据目的损失值更新第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络的网络参数,并对更新后的第一切比雪夫图卷积神经网络和更新后的第二切比雪夫图卷积神经网络进行迭代训练,直至目的损失值符合预设收敛条件。In a specific implementation, if the target loss value does not meet the preset convergence condition, the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, And perform iterative training on the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition.
其中,根据目的损失值更新第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络的网络参数,包括:根据目的损失值更新第一切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至第二切比雪夫图卷积神经网络;或根据目的损失值更新第二切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至第一切比雪夫图卷积神经网络;或根据目的损失值计算得到新网络参数后,将新网络参数共享至第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络。Wherein, updating the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the purpose loss value includes: updating the first Chebyshev graph convolutional neural network according to the purpose loss value After network parameters, share the updated network parameters to the second Chebyshev graph convolutional neural network; or update the network parameters of the second Chebyshev graph convolutional neural network according to the target loss value, and update the updated The network parameters are shared to the first Chebyshev graph convolutional neural network; or after the new network parameters are calculated according to the target loss value, the new network parameters are shared to the first Chebyshev graph convolutional neural network and the second Chebyshev graph Convolutional neural network.
可见,本实施例设计了两个切比雪夫图卷积神经网络,第一切比雪夫图卷积神经网络基于顶点特征矩阵、邻接矩阵、标签矩阵进行有监督训练,同时第二切比雪夫图卷积神经网络基于顶点特征矩阵、正逐点互信息矩阵和第一切比雪夫图卷积神经网络在训练过程中的输出,进行无监督训练;当基于二者的损失值所确定的目的损失值符合预设收敛条件时,将两个切比雪夫图卷积神经网络组合为对偶顶点分类模型,从而训练得到了性能更佳的顶点分类模型。该方案能够充分发挥有监督训练和无监督训练各自的优势,提升了顶点分类模型的性能。It can be seen that in this embodiment, two Chebyshev graph convolutional neural networks are designed. The first Chebyshev graph convolutional neural network performs supervised training based on the vertex feature matrix, adjacency matrix, and label matrix, while the second Chebyshev graph convolutional neural network The convolutional neural network performs unsupervised training based on the vertex feature matrix, the positive point-wise mutual information matrix and the output of the first Chebyshev graph convolutional neural network during the training process; when the target loss determined based on the loss values of the two When the values meet the preset convergence conditions, the two Chebyshev graph convolutional neural networks are combined into a dual vertex classification model, and a vertex classification model with better performance is trained. This scheme can give full play to the respective advantages of supervised training and unsupervised training, and improves the performance of the vertex classification model.
基于上述实施例,需要说明的是,对偶顶点分类模型也可称为对偶切比雪夫图卷积神经网络(DCGCN,Dual Chebyshev Graph Convolutional Neural Network)。为训练得到对偶切比雪夫图卷积神经网络,需要首先确定网络结构、损失函数、初始化策略、网络参数更新方式等。Based on the above embodiments, it should be noted that the dual vertex classification model can also be called a dual Chebyshev graph convolutional neural network (DCGCN, Dual Chebyshev Graph Convolutional Neural Network). In order to train the dual Chebyshev graph convolutional neural network, it is necessary to first determine the network structure, loss function, initialization strategy, network parameter update method, etc.
1、网络结构。1. Network structure.
对偶切比雪夫图卷积神经网络包括两个完全相同的、共享参数的切比雪夫图卷积神经网络ChebyNet,每个ChebyNet都由输入层、L个图卷积层和输出层组成。The dual Chebyshev graph convolutional neural network includes two identical Chebyshev graph convolutional neural networks ChebyNet with shared parameters, and each ChebyNet consists of an input layer, L graph convolutional layers and an output layer.
请参见图3,记两个ChebyNet分别为ChebyNet
A和ChebyNet
P。ChebyNet
A以编码图局部一致性信息的邻接矩阵A和顶点特征矩阵X作为输入数据,输出顶点类别标签预测矩阵Z
A;ChebyNet
P以编码 图全局一致性信息的正逐点互信息矩阵P和顶点特征矩阵X为作为输入数据,输出顶点类别标签预测矩阵Z
P。
Please refer to Figure 3, remember that the two ChebyNets are ChebyNet A and ChebyNet P respectively. ChebyNet A takes the adjacency matrix A and vertex feature matrix X of encoding graph local consistency information as input data, and outputs the vertex category label prediction matrix Z A ; ChebyNet P uses the positive point-wise mutual information matrix P and vertex feature encoding graph global consistency information The feature matrix X is used as input data, and the vertex category label prediction matrix Z P is output.
其中,ChebyNet
A根据部分有标签的图顶点进行有监督学习,预测准确度较高;ChebyNet
P在前者的指导下(利用其预测结果Z
A)利用无标签的图顶点进行无监督学习,以提高预测准确度,获得更好的顶点分类模型。当ChebyNet
A和ChebyNet
P训练结束后,Z
A和Z
P一致或差别可忽略不计,因此可以Z
A或Z
P作为对偶切比雪夫图卷积神经网络的输出。
Among them, ChebyNet A performs supervised learning based on some labeled graph vertices, and the prediction accuracy is high; under the guidance of the former (using its prediction result Z A ), ChebyNet P uses unlabeled graph vertices for unsupervised learning to improve Prediction accuracy for better vertex classification models. After the training of ChebyNet A and ChebyNet P , Z A and Z P are consistent or the difference is negligible, so Z A or Z P can be used as the output of the dual Chebyshev graph convolutional neural network.
图4示意了对偶切比雪夫图卷积神经网络的结构。图4中的卷积层即下文所述的图卷积层。Figure 4 illustrates the structure of a dual Chebyshev graph convolutional neural network. The convolutional layer in Figure 4 is the graph convolutional layer described below.
其中,输入层主要负责读取待分类图数据,包括顶点特征矩阵X、表示图拓扑结构的邻接矩阵A、编码图全局一致性信息的正逐点互信息矩阵P。Among them, the input layer is mainly responsible for reading the graph data to be classified, including the vertex feature matrix X, the adjacency matrix A representing the topology of the graph, and the positive point-by-point mutual information matrix P that encodes the global consistency information of the graph.
第l(1≤l≤L)图卷积层定义:为减少网络参数,将第l隐藏层图卷积操作分解为特征变换和图卷积先后两个阶段。Definition of the lth (1≤l≤L) graph convolution layer: In order to reduce the network parameters, the graph convolution operation of the lth hidden layer is decomposed into two stages of feature transformation and graph convolution.
其中,特征变换公式为:
图卷积操作公式为:
Among them, the feature transformation formula is: The graph convolution operation formula is:
其中,Q
l为图卷积神经网络第l图卷积层经特征变换后的顶点特征矩阵;H
l为图卷积神经网络的第l图卷积层的输入数据,H
l+1为图卷积神经网络的第l图卷积层的输出数据;
是图卷积神经网络的第l图卷积层需学习的特征变换矩阵的转置矩阵;σ为非线性激活函数;K<<n,为多项式的阶数;n为图数据集中的顶点个数;θ
k是多项式的系数;T
k(x)=2xT
k-1(x)-T
k-2(x),且T
0=1,T
1=x为切比雪夫多项式;
为图数据集的拉普拉斯矩阵,
为经过线性变换后的拉普拉斯矩阵。其中,H
1为顶点特征矩阵X。
Among them, Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation; Hl is the input data of the lth convolutional layer of the graph convolutional neural network, and Hl+1 is the graph The output data of the first convolutional layer of the convolutional neural network; is the transposition matrix of the feature transformation matrix that needs to be learned by the first graph convolutional layer of the graph convolutional neural network; σ is the nonlinear activation function; K<<n is the order of the polynomial; n is the number of vertices in the graph dataset Number; θ k is the coefficient of polynomial; T k (x)=2xT k-1 (x)-T k-2 (x), and T 0 =1, T 1 =x is Chebyshev polynomial; is the Laplacian matrix of the graph dataset, is the Laplacian matrix after linear transformation. Among them, H1 is the vertex feature matrix X.
其中,
λ
max为
中最大的特征值,I
n为n*n维的恒等矩阵。
in, λ max is The largest eigenvalue in , I n is an n*n-dimensional identity matrix.
需要说明的是,
由
(记为公式)简化得到,简化过程可参照如下内容:
It should be noted, Depend on (denoted as a formula) is simplified, and the simplification process can refer to the following content:
其中,U是由对图G的拉普拉斯矩阵
进行特征分解得到的特征向量所组成的矩阵;U
-1是U的逆矩阵;Λ是特征值的对角阵,对角线上的各元素分别为λ
1,λ
2,…,λ
n。
是第l层图卷积层的图卷积核矩阵,并定义为:
where U is given by the Laplacian matrix of the graph G A matrix composed of eigenvectors obtained by eigendecomposition; U -1 is the inverse matrix of U; Λ is a diagonal matrix of eigenvalues, and the elements on the diagonal are λ 1 , λ 2 ,…,λ n . is the graph convolution kernel matrix of the l-th graph convolution layer, and is defined as:
需要说明的是,θ
k表示多项式的阶数,能够限制信息在每个顶点最多传播K步。因此仅需K+1个参数,大大降低了模型训练过程的复杂度。由于公式
计算卷积核矩阵时涉及到图拉普拉斯矩阵的特征分解,计算开销大。因此本实施例在此基础上,借助切比雪夫多项式设计近似计算方案,并将
近似为:
It should be noted that θ k represents the order of the polynomial, which can limit the information to propagate at most K steps at each vertex. Therefore, only K+1 parameters are required, which greatly reduces the complexity of the model training process. due to the formula The calculation of the convolution kernel matrix involves the eigendecomposition of the graph Laplacian matrix, which is computationally expensive. Therefore, on this basis, the present embodiment uses the Chebyshev polynomials to design an approximate calculation scheme, and Approximately:
其中,T
k(x)=2xT
k-1(x)-T
k-2(x),且T
0=1,T
1=x为切比雪夫多项式,可循环递归求解;
是一个对角阵,能将特征值对角阵映射到[-1,1]。
Wherein, T k (x)=2xT k-1 (x)-T k-2 (x), and T 0 =1, T 1 =x is a Chebyshev polynomial, which can be solved recursively; Is a diagonal matrix that can map the eigenvalue diagonal matrix to [-1,1].
其中,
输出层定义为
Z是一个n*C维的矩阵,其每个列向量Z
j表示所有顶点属于类别j的概率,即它的第k(1≤k≤n)个元素表示顶点k属于类别j(j=1,2,…,C)的概率。
in, The output layer is defined as Z is an n*C-dimensional matrix, and each column vector Z j represents the probability that all vertices belong to category j, that is, its kth element (1≤k≤n) indicates that vertex k belongs to category j (j=1 ,2,…,C) probability.
2、损失函数。2. Loss function.
对偶切比雪夫图卷积神经网络的损失函数由带标签顶点有监督学习损失ls
S和无标签顶点无监督学习损失ls
U两部分组成。
The loss function of the dual Chebyshev graph convolutional neural network consists of two parts: the supervised learning loss ls S with labeled vertices and the unsupervised learning loss ls U for unlabeled vertices.
其中,ChebyNet
A以邻接矩阵A和顶点特征矩阵X为输入,进行有监督学习,并将顶点标签预测结果Z
A和已知的顶点标签矩阵Y进行比较,计算有监督学习损失。ChebyNet
P以正逐点互信息矩阵和顶点特征矩阵X作为输入,进行无监督学习,并将其预测结果Z
P和ChebyNet
A的预测结果Z
A进行比较,计算无监督学习损失。据此,对偶切比雪夫图卷积神经网络的损失函数可以表示为:
其中,α是一个常数,用以调节无监督学习损失在整个损失函数中所占的比例。
Among them, ChebyNet A takes the adjacency matrix A and the vertex feature matrix X as input for supervised learning, and compares the vertex label prediction result Z A with the known vertex label matrix Y to calculate the supervised learning loss. ChebyNet P takes the positive point-wise mutual information matrix and vertex feature matrix X as input for unsupervised learning, and compares its prediction result Z P with ChebyNet A 's prediction result Z A to calculate the unsupervised learning loss. Accordingly, the loss function of the dual Chebyshev graph convolutional neural network can be expressed as: Among them, α is a constant used to adjust the proportion of unsupervised learning loss in the entire loss function.
其中,有监督学习损失函数基于交叉熵原理,计算顶点实际标签概率分布和预测标签概率分布的差异程度;无监督学习损失函数计算Z
P和Z
A相同坐标元素之间差值的平方和。
Among them, the supervised learning loss function calculates the degree of difference between the actual label probability distribution and the predicted label probability distribution of the vertex based on the principle of cross entropy; the unsupervised learning loss function calculates the sum of squares of the difference between the same coordinate elements of Z P and Z A.
3、初始化策略。3. Initialize the strategy.
网络参数的初始化策略可以选择正态分布随机初始化、Xavier初始化或He Initialization初始化等。网络参数包含特征变换矩阵Θ
l和卷积核F
l。
The initialization strategy of network parameters can choose normal distribution random initialization, Xavier initialization or He Initialization initialization, etc. Network parameters include feature transformation matrix Θ l and convolution kernel F l .
4、网络参数更新方式。4. Network parameter update method.
可以按照随机梯度下降(StochasticGradientDescent,SGD)、动量梯度下降(MomentumGradientDescent,MGD)、NesterovMomentum、AdaGrad、RMSprop和Adam(AdaptiveMomentEstimation)或批量梯度下降(BatchGradientDescent,BGD)等,对网络参数进行修正和更新,以优化损失函数值。The network parameters can be corrected and updated according to stochastic gradient descent (StochasticGradientDescent, SGD), momentum gradient descent (MomentumGradientDescent, MGD), NesterovMomentum, AdaGrad, RMSprop and Adam (AdaptiveMomentEstimation) or batch gradient descent (BatchGradientDescent, BGD), etc., to Optimize the loss function value.
确定网络结构、损失函数、初始化策略、网络参数更新方式等内容后,对偶切比雪夫图卷积神经网络的训练过程可参照图5进行,具体包括:对于图数据集G,构造顶点特征矩阵X、编码图全局一致性信息的正逐点互信息矩阵P、编码图局部一致性信息的邻接矩阵A、顶点标签矩阵Y;将顶点特征矩阵X和邻接矩阵A输入ChebyNet
A,将正逐点互信息矩阵P和顶点特征矩阵X输入ChebyNet
P,并按照上述损失函数更新网络参数,以训练ChebyNet
A和ChebyNet
P。若损失函数值达到一个指定的较小值或迭代次数达到指定的最大值时,训练结束,得到对偶切比雪夫图卷积神经网络。此时,对于无类别标签的顶点i∈V
U,可根据顶点标签矩阵Y得到其应归属的类别j。
After determining the network structure, loss function, initialization strategy, network parameter update method, etc., the training process of the dual Chebyshev graph convolutional neural network can be carried out with reference to Figure 5, specifically including: for the graph data set G, construct the vertex feature matrix X , the positive point-by-point mutual information matrix P of the global consistency information of the encoded graph, the adjacency matrix A of the local consistency information of the encoded graph, and the vertex label matrix Y; the vertex feature matrix X and the adjacency matrix A are input into ChebyNet A , and the positive point-by-point mutual information The information matrix P and vertex feature matrix X are input into ChebyNet P , and the network parameters are updated according to the above loss function to train ChebyNet A and ChebyNet P. If the value of the loss function reaches a specified smaller value or the number of iterations reaches the specified maximum value, the training ends and a dual Chebyshev graph convolutional neural network is obtained. At this time, for a vertex i∈V U without a class label, the class j it should belong to can be obtained according to the vertex label matrix Y.
在训练过程中,根据图卷积层的定义,结合该层输入的特征矩阵,计算每一个层的输出特征矩阵;按照输出层的定义,预测所有顶点属于每一类别j的概率Z
j(1≤j≤C),并根据前述定义的损失函数计算损失函数值;对于无标签顶点v
i∈V
U,取概率最大的那一类别作为该顶点的最新类别,来更新顶点标签矩阵Y。
During the training process, according to the definition of the graph convolutional layer, combined with the input feature matrix of this layer, the output feature matrix of each layer is calculated; according to the definition of the output layer, the probability Z j (1 ≤j≤C), and calculate the loss function value according to the loss function defined above; for an unlabeled vertex v i ∈ V U , take the category with the highest probability as the latest category of the vertex to update the vertex label matrix Y.
在该方案中,对偶切比雪夫图卷积神经网络由两个同结构的、共享参数的切比雪夫图卷积神经网络组成,此二者分别进行有监督学习和无监督学习,可以提高网络的收敛速率和预测准确度;同时,基于图傅里叶变换定义图卷积层,将图卷积操作分为特征变换和图卷积两个阶段,可以减少网络参数量;基于谱图理论,定义图卷积核为多项式卷积核,保证了图卷积计算的局部性;为降低计算复杂度,利用切比雪夫多项式近似计算图卷积。In this scheme, the dual Chebyshev graph convolutional neural network is composed of two Chebyshev graph convolutional neural networks with the same structure and shared parameters. The two perform supervised learning and unsupervised learning respectively, which can improve the network The convergence rate and prediction accuracy; at the same time, the graph convolution layer is defined based on the graph Fourier transform, and the graph convolution operation is divided into two stages of feature transformation and graph convolution, which can reduce the amount of network parameters; based on the spectral graph theory, The graph convolution kernel is defined as a polynomial convolution kernel, which ensures the locality of the graph convolution calculation; in order to reduce the computational complexity, the Chebyshev polynomial is used to approximate the graph convolution.
可见,本实施例提供了一种对偶切比雪夫图卷积神经网络的训练方法,能够解决顶点分类问题。首先,对搜集到的数据集进行图建模,得到其邻接矩阵和顶点特征矩阵;以邻接矩阵为基础,对于每个顶点,在图上开展特定长度的随机游走,通过对产生的游走序列采样得到正逐点互信息矩阵,该矩阵表征顶点的上下文信息;根据谱图理论定义卷积操作,构造用于特征提取的图卷积层和用于顶点分类任务的输出层,搭建并训练切比雪夫图卷积神经网络;训练结束时,即可得到图中未标记顶点的分类预测结果。It can be seen that this embodiment provides a training method for a dual Chebyshev graph convolutional neural network, which can solve the problem of vertex classification. First, graph modeling is performed on the collected data set to obtain its adjacency matrix and vertex feature matrix; based on the adjacency matrix, for each vertex, a random walk of a specific length is carried out on the graph, and the resulting walk is Sequence sampling obtains a positive point-by-point mutual information matrix, which represents the context information of vertices; defines the convolution operation according to the spectral graph theory, constructs the graph convolution layer for feature extraction and the output layer for vertex classification tasks, builds and trains Chebyshev graph convolutional neural network; at the end of training, classification predictions for unlabeled vertices in the graph are available.
与仅含有单个图卷积神经网络的分类系统相比,该方法因采用对偶图卷积神经网络的设计策略,可学习到更多图拓扑结构信息,包括每个顶点的局部一致性和全局一致性信息,大大提升了模型的学习能力;并且,同时利用图拓扑结构和顶点的属性特征,结合监督和无监督学习,有效提高了分类的准确度;借助切比雪夫多项式近似计算图卷积,避免运算代价高昂的矩阵特征分解操作,有效降低了网络的计算复杂性,提高了网络的分类效率。Compared with the classification system with only a single graph convolutional neural network, this method can learn more graph topology information, including the local consistency and global consistency of each vertex, due to the design strategy of the dual graph convolutional neural network. The characteristic information greatly improves the learning ability of the model; and, at the same time, using the graph topology and attribute characteristics of vertices, combined with supervised and unsupervised learning, effectively improves the accuracy of classification; with the help of Chebyshev polynomials to approximate the calculation of graph convolution, Avoiding the expensive matrix eigendecomposition operation effectively reduces the computational complexity of the network and improves the classification efficiency of the network.
下面对本申请实施例提供的一种模型训练装置进行介绍,下文描述的一种模型训练装置与上文描述的一种模型训练方法可以相互参照。A model training device provided in the embodiment of the present application is introduced below, and a model training device described below and a model training method described above may refer to each other.
参见图6所示,本申请实施例公开了一种模型训练装置,包括:Referring to Figure 6, the embodiment of the present application discloses a model training device, including:
获取模块601,用于获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵;Obtaining module 601, used to obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set;
采样模块602,用于基于邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵;The sampling module 602 is used to perform random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix;
第一训练模块603,用于将顶点特征矩阵和邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果;The first training module 603 is used to input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;
第二训练模块604,用于将顶点特征矩阵和正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果;The second training module 604 is used to input the vertex feature matrix and the positive point-by-point mutual information matrix into the second Chebyshev graph convolutional neural network to output the second training result;
第一计算模块605,用于计算第一训练结果和标签矩阵之间的第一损失值;The first calculation module 605 is used to calculate the first loss value between the first training result and the label matrix;
第二计算模块606,用于计算第二训练结果和第一训练结果之间的第二损失值;A second calculation module 606, configured to calculate a second loss value between the second training result and the first training result;
确定模块607,用于基于第一损失值和第二损失值确定目的损失值;A determining module 607, configured to determine a target loss value based on the first loss value and the second loss value;
组合模块608,用于若目的损失值符合预设收敛条件,则将第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。The combination module 608 is configured to combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model if the target loss value meets the preset convergence condition.
在一种具体实施方式中,采样模块具体用于:In a specific implementation manner, the sampling module is specifically used for:
基于邻接矩阵,对图数据集中的每个顶点进行预设长度的随机游走,得到每个顶点的上下文路径;Based on the adjacency matrix, a random walk of preset length is performed on each vertex in the graph dataset to obtain the context path of each vertex;
对所有上下文路径进行随机采样,以确定任意两个顶点的共现次数,并构建顶点共现次数矩阵;Randomly sample all context paths to determine the number of co-occurrences of any two vertices and construct a matrix of vertex co-occurrences;
基于顶点共现次数矩阵,计算顶点与上下文共现概率和相应的边缘概率,并确定正逐点互信息矩阵中的每个元素。Based on the vertex co-occurrence times matrix, the vertex and context co-occurrence probability and the corresponding edge probability are calculated, and each element in the positive point-wise mutual information matrix is determined.
在一种具体实施方式中,第一计算模块具体用于:In a specific implementation manner, the first calculation module is specifically used for:
基于交叉熵原理,将第一训练结果和标签矩阵之间的概率分布差异程度作为第一损失值。Based on the principle of cross entropy, the degree of difference in probability distribution between the first training result and the label matrix is used as the first loss value.
在一种具体实施方式中,第二计算模块具体用于:In a specific implementation manner, the second calculation module is specifically used for:
计算第二训练结果和第一训练结果中具有相同坐标的元素的差值,并将所有差值的平方和作为第二损失值。Calculate the difference between elements with the same coordinates in the second training result and the first training result, and use the sum of squares of all differences as the second loss value.
在一种具体实施方式中,确定模块具体用于:In a specific implementation manner, the determination module is specifically used for:
将第一损失值和第二损失值输入损失函数,以输出目的损失值;Input the first loss value and the second loss value into the loss function to output the target loss value;
其中,损失函数为:ls=ls
S+αls
U,ls为目的损失值,ls
S为第一损失值,ls
U为第二损失值,α为调节第二损失值在目的损失值中所占比例的常数。
Among them, the loss function is: ls=ls S + αls U , ls is the target loss value, ls S is the first loss value, ls U is the second loss value, and α is the ratio of the second loss value to the target loss value proportional constant.
在一种具体实施方式中,若目的损失值不符合预设收敛条件,则根据目的损失值更新第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络的网络参数,并对更新后的第一切比雪夫图卷积神经网络和更新后的第二切比雪夫图卷积神经网络进行迭代训练,直至目的损失值符合预设收敛条件;In a specific implementation, if the target loss value does not meet the preset convergence condition, the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, And perform iterative training on the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition;
其中,根据目的损失值更新第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络的网络参数,包括:Among them, the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, including:
根据目的损失值更新第一切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至第二切比雪夫图卷积神经网络;After updating the network parameters of the first Chebyshev graph convolutional neural network according to the target loss value, the updated network parameters are shared to the second Chebyshev graph convolutional neural network;
或or
根据目的损失值更新第二切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至第一切比雪夫图卷积神经网络;After updating the network parameters of the second Chebyshev graph convolutional neural network according to the target loss value, the updated network parameters are shared to the first Chebyshev graph convolutional neural network;
或or
根据目的损失值计算得到新网络参数后,将新网络参数共享至第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络。After the new network parameters are calculated according to the target loss value, the new network parameters are shared to the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network.
在一种具体实施方式中,第一切比雪夫图卷积神经网络和第二切比雪夫图卷积神经网络均包括L层图卷积层,该L层图卷积层用于对输入数据进行特征变换和图卷积操作;In a specific implementation, both the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used to process the input data Perform feature transformation and graph convolution operations;
其中,第l(1≤l≤L)层图卷积层的特征变换公式为:
第l(1≤l≤L)层图卷积层的图卷积操作公式为:
Among them, the feature transformation formula of the lth (1≤l≤L) layer graph convolution layer is: The graph convolution operation formula of the lth (1≤l≤L) layer graph convolution layer is:
其中,Q
l为图卷积神经网络第l图卷积层经特征变换后的顶点特征矩阵;H
l为图卷积神经网络的第l图卷积层的输入数据,H
l+1为图卷积神经网络的第l图卷积层的输出数据;
是图卷积神经网络的第l图卷积层需学习的特征变换矩阵的转置矩阵;σ为非线性激活函数;K<<n,为多项式的阶数;n为图数据集中的顶点个数;θ
k是多项式的系数;T
k(x)=2xT
k-1(x)-T
k-2(x),且T
0=1,T
1=x为切比雪夫多项式;
为图数据集的拉普拉斯矩阵,
为经过线性变换后的拉普拉斯矩阵。
Among them, Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation; Hl is the input data of the lth convolutional layer of the graph convolutional neural network, and Hl+1 is the graph The output data of the first convolutional layer of the convolutional neural network; is the transposition matrix of the feature transformation matrix that needs to be learned by the first graph convolutional layer of the graph convolutional neural network; σ is the nonlinear activation function; K<<n is the order of the polynomial; n is the number of vertices in the graph dataset Number; θ k is the coefficient of polynomial; T k (x)=2xT k-1 (x)-T k-2 (x), and T 0 =1, T 1 =x is Chebyshev polynomial; is the Laplacian matrix of the graph dataset, is the Laplacian matrix after linear transformation.
其中,关于本实施例中各个模块、单元更加具体的工作过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。For the more specific working process of each module and unit in this embodiment, reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.
可见,本实施例提供了一种模型训练装置,该装置能够充分发挥有监督训练和无监督训练各自的优势,提升了顶点分类模型的性能。It can be seen that this embodiment provides a model training device, which can give full play to the respective advantages of supervised training and unsupervised training, and improve the performance of the vertex classification model.
下面对本申请实施例提供的一种模型训练设备进行介绍,下文描述的一种模型训练设备与上文描述的一种模型训练方法及装置可以相互参照。The following introduces a model training device provided in the embodiment of the present application, and the model training device described below and the model training method and device described above may refer to each other.
参见图7所示,本申请实施例公开了一种模型训练设备,包括:Referring to Figure 7, the embodiment of the present application discloses a model training device, including:
存储器701,用于保存计算机程序; Memory 701, used to store computer programs;
处理器702,用于执行所述计算机程序,以实现上述任意实施例公开的方法。The processor 702 is configured to execute the computer program, so as to implement the method disclosed in any of the foregoing embodiments.
下面对本申请实施例提供的一种可读存储介质进行介绍,下文描述的一种可读存储介质与上文描述的一种模型训练方法、装置及设备可以相互参照。A readable storage medium provided by an embodiment of the present application is introduced below. The readable storage medium described below and the model training method, device, and equipment described above may refer to each other.
一种可读存储介质,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现前述实施例公开的模型训练方法。关于该方法的具体步骤可以参考前述实施例中公开的相应内容,在此不再进行赘述。A readable storage medium is used to store a computer program, wherein the computer program implements the model training method disclosed in the foregoing embodiments when executed by a processor. Regarding the specific steps of the method, reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.
本申请涉及的“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述 的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法或设备固有的其它步骤或单元。"First", "second", "third", "fourth" and the like referred to in the present application, if any, are used to distinguish similar objects and not necessarily to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, e.g. a process, method or apparatus comprising a series of steps or elements is not necessarily limited to those steps or elements explicitly listed , but may include other steps or elements not explicitly listed or inherent to the process, method or apparatus.
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。It should be noted that the descriptions in this application involving "first", "second" and so on are for descriptive purposes only, and should not be understood as indicating or implying their relative importance or implicitly indicating the number of indicated technical features . Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In addition, the technical solutions of the various embodiments can be combined with each other, but it must be based on the realization of those skilled in the art. When the combination of technical solutions is contradictory or cannot be realized, it should be considered that the combination of technical solutions does not exist , nor within the scope of protection required by the present application.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的可读存储介质中。The steps of the methods or algorithms described in connection with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known readable storage medium.
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。In this paper, specific examples are used to illustrate the principle and implementation of the application. The description of the above embodiments is only used to help understand the method and core idea of the application; at the same time, for those of ordinary skill in the art, according to the application There will be changes in the specific implementation and scope of application. In summary, the content of this specification should not be construed as limiting the application.
Claims (10)
- 一种模型训练方法,其特征在于,包括:A model training method, characterized in that, comprising:获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵;Obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph dataset;基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵;performing random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix;将所述顶点特征矩阵和所述邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果;Input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;将所述顶点特征矩阵和所述正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果;Inputting the vertex feature matrix and the positive point-wise mutual information matrix into a second Chebyshev graph convolutional neural network to output a second training result;计算所述第一训练结果和所述标签矩阵之间的第一损失值;calculating a first loss value between the first training result and the label matrix;计算所述第二训练结果和所述第一训练结果之间的第二损失值;calculating a second loss value between the second training result and the first training result;基于所述第一损失值和所述第二损失值确定目的损失值;determining a target loss value based on the first loss value and the second loss value;若所述目的损失值符合预设收敛条件,则将所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。If the target loss value meets the preset convergence condition, combining the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model.
- 根据权利要求1所述的模型训练方法,其特征在于,所述基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵,包括:The model training method according to claim 1, wherein said random walk and sampling are performed based on said adjacency matrix to obtain a positive point-by-point mutual information matrix, comprising:基于所述邻接矩阵,对所述图数据集中的每个顶点进行预设长度的随机游走,得到每个顶点的上下文路径;Based on the adjacency matrix, a random walk of a preset length is performed on each vertex in the graph data set to obtain a context path of each vertex;对所有上下文路径进行随机采样,以确定任意两个顶点的共现次数,并构建顶点共现次数矩阵;Randomly sample all context paths to determine the number of co-occurrences of any two vertices and construct a matrix of vertex co-occurrences;基于顶点共现次数矩阵,计算顶点与上下文共现概率和相应的边缘概率,并确定所述正逐点互信息矩阵中的每个元素。Based on the vertex co-occurrence times matrix, the co-occurrence probability of the vertex and the context and the corresponding edge probability are calculated, and each element in the positive point-wise mutual information matrix is determined.
- 根据权利要求1所述的模型训练方法,其特征在于,所述计算所述第一训练结果和所述标签矩阵之间的第一损失值,包括:The model training method according to claim 1, wherein the calculating the first loss value between the first training result and the label matrix comprises:基于交叉熵原理,将所述第一训练结果和所述标签矩阵之间的概率分布差异程度作为所述第一损失值。Based on the cross-entropy principle, the degree of difference in probability distribution between the first training result and the label matrix is used as the first loss value.
- 根据权利要求1所述的模型训练方法,其特征在于,所述计算所述第二训练结果和所述第一训练结果之间的第二损失值,包括:The model training method according to claim 1, wherein said calculating a second loss value between said second training result and said first training result comprises:计算所述第二训练结果和所述第一训练结果中具有相同坐标的元素的差值,并将所有差值的平方和作为所述第二损失值。calculating the difference between elements having the same coordinates in the second training result and the first training result, and using the sum of squares of all differences as the second loss value.
- 根据权利要求1所述的模型训练方法,其特征在于,所述基于所述第一损失值和所述第二损失值确定目的损失值,包括:The model training method according to claim 1, wherein said determining a target loss value based on said first loss value and said second loss value comprises:将所述第一损失值和所述第二损失值输入损失函数,以输出所述目的损失值;inputting the first loss value and the second loss value into a loss function to output the target loss value;其中,所述损失函数为:ls=ls S+αls U,ls为所述目的损失值,ls S为所述第一损失值,ls U为所述第二损失值,α为调节第二损失值在目的损失值中所占比例的常数。 Wherein, the loss function is: ls=ls S +αls U , ls is the target loss value, ls S is the first loss value, ls U is the second loss value, and α is the adjusted second loss value Constant for the proportion of the value in the destination loss value.
- 根据权利要求1至5任一项所述的模型训练方法,其特征在于,The model training method according to any one of claims 1 to 5, characterized in that,若所述目的损失值不符合预设收敛条件,则根据所述目的损失值更新所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络的网络参数,并对更新后的第一切比雪夫图卷积神经网络和更新后的第二切比雪夫图卷积神经网络进行迭代训练,直至所述目的损失值符合预设收敛条件;If the target loss value does not meet the preset convergence condition, updating network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the target loss value, And performing iterative training on the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition;其中,所述根据所述目的损失值更新所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络的网络参数,包括:Wherein, the updating the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the purpose loss value includes:根据所述目的损失值更新所述第一切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至所述第二切比雪夫图卷积神经网络;After updating the network parameters of the first Chebyshev graph convolutional neural network according to the target loss value, sharing the updated network parameters to the second Chebyshev graph convolutional neural network;或or根据所述目的损失值更新所述第二切比雪夫图卷积神经网络的网络参数后,将更新后的该网络参数共享至所述第一切比雪夫图卷积神经网络;After updating the network parameters of the second Chebyshev graph convolutional neural network according to the target loss value, sharing the updated network parameters to the first Chebyshev graph convolutional neural network;或or根据所述目的损失值计算得到新网络参数后,将所述新网络参数共享至所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络。After the new network parameters are calculated according to the target loss value, the new network parameters are shared with the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network.
- 根据权利要求1至5任一项所述的模型训练方法,其特征在于,所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络均包括L层图卷积层,该L层图卷积层用于对输入数据进行特征变换和图卷积操作;The model training method according to any one of claims 1 to 5, wherein the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network both include L-layer graphs Convolution layer, the L-layer graph convolution layer is used to perform feature transformation and graph convolution operations on input data;其中,第l(1≤l≤L)层图卷积层的特征变换公式为: 第l(1≤l≤L)层图卷积层的图卷积操作公式为: Among them, the feature transformation formula of the lth (1≤l≤L) layer graph convolution layer is: The graph convolution operation formula of the lth (1≤l≤L) layer graph convolution layer is:其中,Q l为图卷积神经网络第l图卷积层经特征变换后的顶点特征矩阵;H l为图卷积神经网络的第l图卷积层的输入数据,H l+1为图卷积神经网络的第l图卷积层的输出数据; 是图卷积神经网络的第l图卷积层需学习的特征变换矩阵的转置矩阵;σ为非线性激活函数;K<<n,为多项式的阶数;n为所述图数据集中的顶点个数;θ k是多项式的系数;T k(x)=2xT k-1(x)-T k-2(x),且T 0=1,T 1=x为切比雪夫多项式; 为所述图数据集的拉普拉斯矩阵, 为经过线性变换后的拉普拉斯矩阵。 Among them, Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation; Hl is the input data of the lth convolutional layer of the graph convolutional neural network, and Hl+1 is the graph The output data of the first convolutional layer of the convolutional neural network; is the transposition matrix of the feature transformation matrix that needs to be learned by the first convolutional layer of the graph convolutional neural network; σ is a nonlinear activation function; K<<n is the order of the polynomial; n is the The number of vertices; θ k is the coefficient of the polynomial; T k (x)=2xT k-1 (x)-T k-2 (x), and T 0 =1, T 1 =x is the Chebyshev polynomial; is the Laplacian matrix of the graph dataset, is the Laplacian matrix after linear transformation.
- 一种模型训练装置,其特征在于,包括:A model training device, characterized in that it comprises:获取模块,用于获取基于图数据集构建的顶点特征矩阵、邻接矩阵和标签矩阵;The obtaining module is used to obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set;采样模块,用于基于所述邻接矩阵进行随机游走和采样,得到正逐点互信息矩阵;A sampling module, configured to perform random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix;第一训练模块,用于将所述顶点特征矩阵和所述邻接矩阵输入第一切比雪夫图卷积神经网络,以输出第一训练结果;The first training module is used to input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;第二训练模块,用于将所述顶点特征矩阵和所述正逐点互信息矩阵输入第二切比雪夫图卷积神经网络,以输出第二训练结果;The second training module is used to input the vertex feature matrix and the positive point-wise mutual information matrix into the second Chebyshev graph convolutional neural network to output the second training result;第一计算模块,用于计算所述第一训练结果和所述标签矩阵之间的第一损失值;a first calculation module, configured to calculate a first loss value between the first training result and the label matrix;第二计算模块,用于计算所述第二训练结果和所述第一训练结果之间的第二损失值;a second calculation module, configured to calculate a second loss value between the second training result and the first training result;确定模块,用于基于所述第一损失值和所述第二损失值确定目的损失值;a determining module, configured to determine a target loss value based on the first loss value and the second loss value;组合模块,用于若所述目的损失值符合预设收敛条件,则将所述第一切比雪夫图卷积神经网络和所述第二切比雪夫图卷积神经网络组合为对偶顶点分类模型。A combination module, configured to combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model if the target loss value meets a preset convergence condition .
- 一种模型训练设备,其特征在于,包括:A model training device, characterized in that it comprises:存储器,用于存储计算机程序;memory for storing computer programs;处理器,用于执行所述计算机程序,以实现如权利要求1至7任一项所述的模型训练方法。A processor, configured to execute the computer program, so as to realize the model training method according to any one of claims 1 to 7.
- 一种可读存储介质,其特征在于,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的模型训练方法。A readable storage medium, characterized by being used to store a computer program, wherein the computer program implements the model training method according to any one of claims 1 to 7 when executed by a processor.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110825194.9A CN113705772A (en) | 2021-07-21 | 2021-07-21 | Model training method, device and equipment and readable storage medium |
CN202110825194.9 | 2021-07-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023000574A1 true WO2023000574A1 (en) | 2023-01-26 |
Family
ID=78650163
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/134051 WO2023000574A1 (en) | 2021-07-21 | 2021-11-29 | Model training method, apparatus and device, and readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113705772A (en) |
WO (1) | WO2023000574A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112364372A (en) * | 2020-10-27 | 2021-02-12 | 重庆大学 | Privacy protection method with supervision matrix completion |
CN116109195A (en) * | 2023-02-23 | 2023-05-12 | 深圳市迪博企业风险管理技术有限公司 | Performance evaluation method and system based on graph convolution neural network |
CN116129206A (en) * | 2023-04-14 | 2023-05-16 | 吉林大学 | Processing method and device for image decoupling characterization learning and electronic equipment |
CN116405100A (en) * | 2023-05-29 | 2023-07-07 | 武汉能钠智能装备技术股份有限公司 | Distortion signal restoration method based on priori knowledge |
CN117351239A (en) * | 2023-10-11 | 2024-01-05 | 兰州交通大学 | Multi-scale road network similarity calculation method supported by graph convolution self-encoder |
CN117391150A (en) * | 2023-12-07 | 2024-01-12 | 之江实验室 | Graph data retrieval model training method based on hierarchical pooling graph hash |
CN117540828A (en) * | 2024-01-10 | 2024-02-09 | 中国电子科技集团公司第十五研究所 | Training method and device for training subject recommendation model, electronic equipment and storage medium |
CN117909903A (en) * | 2024-01-26 | 2024-04-19 | 深圳硅山技术有限公司 | Diagnostic method, device, apparatus and storage medium for electric power steering system |
CN117971356A (en) * | 2024-03-29 | 2024-05-03 | 苏州元脑智能科技有限公司 | Heterogeneous acceleration method, device, equipment and storage medium based on semi-supervised learning |
CN118035811A (en) * | 2024-04-18 | 2024-05-14 | 中科南京信息高铁研究院 | State sensing method, control server and medium of electric equipment based on graph convolution neural network |
CN118391723A (en) * | 2024-07-01 | 2024-07-26 | 青岛能源设计研究院有限公司 | Intelligent air source heat pump heating system |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113705772A (en) * | 2021-07-21 | 2021-11-26 | 浪潮(北京)电子信息产业有限公司 | Model training method, device and equipment and readable storage medium |
CN114360007B (en) * | 2021-12-22 | 2023-02-07 | 浙江大华技术股份有限公司 | Face recognition model training method, face recognition device, face recognition equipment and medium |
CN114528994B (en) * | 2022-03-17 | 2024-10-18 | 腾讯科技(深圳)有限公司 | Identification model determining method and related device |
CN114707641B (en) * | 2022-03-23 | 2024-11-08 | 平安科技(深圳)有限公司 | Training method, device, equipment and medium for double-view-angle graph neural network model |
CN114490950B (en) * | 2022-04-07 | 2022-07-12 | 联通(广东)产业互联网有限公司 | Method and storage medium for training encoder model, and method and system for predicting similarity |
CN114943324B (en) * | 2022-05-26 | 2023-10-13 | 中国科学院深圳先进技术研究院 | Neural network training method, human motion recognition method and device, and storage medium |
CN115858725B (en) * | 2022-11-22 | 2023-07-04 | 广西壮族自治区通信产业服务有限公司技术服务分公司 | Text noise screening method and system based on unsupervised graph neural network |
CN116071635A (en) * | 2023-03-06 | 2023-05-05 | 之江实验室 | Image recognition method and device based on structural knowledge propagation |
CN116089652B (en) * | 2023-04-07 | 2023-07-18 | 中国科学院自动化研究所 | Unsupervised training method and device of visual retrieval model and electronic equipment |
CN116402554B (en) * | 2023-06-07 | 2023-08-11 | 江西时刻互动科技股份有限公司 | Advertisement click rate prediction method, system, computer and readable storage medium |
CN116431816B (en) * | 2023-06-13 | 2023-09-19 | 浪潮电子信息产业股份有限公司 | Document classification method, apparatus, device and computer readable storage medium |
CN118552136B (en) * | 2024-07-26 | 2024-10-25 | 浪潮智慧供应链科技(山东)有限公司 | Big data-based supply chain intelligent inventory management system and method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200285944A1 (en) * | 2019-03-08 | 2020-09-10 | Adobe Inc. | Graph convolutional networks with motif-based attention |
CN112464057A (en) * | 2020-11-18 | 2021-03-09 | 苏州浪潮智能科技有限公司 | Network data classification method, device, equipment and readable storage medium |
CN112925909A (en) * | 2021-02-24 | 2021-06-08 | 中国科学院地理科学与资源研究所 | Graph convolution document classification method and system considering local invariance constraint |
CN113705772A (en) * | 2021-07-21 | 2021-11-26 | 浪潮(北京)电子信息产业有限公司 | Model training method, device and equipment and readable storage medium |
-
2021
- 2021-07-21 CN CN202110825194.9A patent/CN113705772A/en active Pending
- 2021-11-29 WO PCT/CN2021/134051 patent/WO2023000574A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200285944A1 (en) * | 2019-03-08 | 2020-09-10 | Adobe Inc. | Graph convolutional networks with motif-based attention |
CN112464057A (en) * | 2020-11-18 | 2021-03-09 | 苏州浪潮智能科技有限公司 | Network data classification method, device, equipment and readable storage medium |
CN112925909A (en) * | 2021-02-24 | 2021-06-08 | 中国科学院地理科学与资源研究所 | Graph convolution document classification method and system considering local invariance constraint |
CN113705772A (en) * | 2021-07-21 | 2021-11-26 | 浪潮(北京)电子信息产业有限公司 | Model training method, device and equipment and readable storage medium |
Non-Patent Citations (1)
Title |
---|
ZHUANG CHENYI ZHUANGCHENYI@GMAIL.COM; MA QIANG QIANG@I.KYOTO-U.AC.JP: "Dual Graph Convolutional Networks for Graph-Based Semi-Supervised Classification", THE WEB CONFERENCE 2018, INTERNATIONAL WORLD WIDE WEB CONFERENCES STEERING COMMITTEE, REPUBLIC AND CANTON OF GENEVASWITZERLAND, 23 April 2018 (2018-04-23) - 27 April 2018 (2018-04-27), Republic and Canton of GenevaSwitzerland , pages 499 - 508, XP058652837, ISBN: 978-1-4503-5640-4, DOI: 10.1145/3178876.3186116 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112364372A (en) * | 2020-10-27 | 2021-02-12 | 重庆大学 | Privacy protection method with supervision matrix completion |
CN116109195A (en) * | 2023-02-23 | 2023-05-12 | 深圳市迪博企业风险管理技术有限公司 | Performance evaluation method and system based on graph convolution neural network |
CN116129206A (en) * | 2023-04-14 | 2023-05-16 | 吉林大学 | Processing method and device for image decoupling characterization learning and electronic equipment |
CN116405100A (en) * | 2023-05-29 | 2023-07-07 | 武汉能钠智能装备技术股份有限公司 | Distortion signal restoration method based on priori knowledge |
CN116405100B (en) * | 2023-05-29 | 2023-08-22 | 武汉能钠智能装备技术股份有限公司 | Distortion signal restoration method based on priori knowledge |
CN117351239A (en) * | 2023-10-11 | 2024-01-05 | 兰州交通大学 | Multi-scale road network similarity calculation method supported by graph convolution self-encoder |
CN117391150A (en) * | 2023-12-07 | 2024-01-12 | 之江实验室 | Graph data retrieval model training method based on hierarchical pooling graph hash |
CN117391150B (en) * | 2023-12-07 | 2024-03-12 | 之江实验室 | Graph data retrieval model training method based on hierarchical pooling graph hash |
CN117540828A (en) * | 2024-01-10 | 2024-02-09 | 中国电子科技集团公司第十五研究所 | Training method and device for training subject recommendation model, electronic equipment and storage medium |
CN117540828B (en) * | 2024-01-10 | 2024-06-04 | 中国电子科技集团公司第十五研究所 | Training method and device for training subject recommendation model, electronic equipment and storage medium |
CN117909903A (en) * | 2024-01-26 | 2024-04-19 | 深圳硅山技术有限公司 | Diagnostic method, device, apparatus and storage medium for electric power steering system |
CN117971356A (en) * | 2024-03-29 | 2024-05-03 | 苏州元脑智能科技有限公司 | Heterogeneous acceleration method, device, equipment and storage medium based on semi-supervised learning |
CN118035811A (en) * | 2024-04-18 | 2024-05-14 | 中科南京信息高铁研究院 | State sensing method, control server and medium of electric equipment based on graph convolution neural network |
CN118391723A (en) * | 2024-07-01 | 2024-07-26 | 青岛能源设计研究院有限公司 | Intelligent air source heat pump heating system |
Also Published As
Publication number | Publication date |
---|---|
CN113705772A (en) | 2021-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023000574A1 (en) | Model training method, apparatus and device, and readable storage medium | |
CN114048331A (en) | Knowledge graph recommendation method and system based on improved KGAT model | |
Bhagat et al. | Node classification in social networks | |
CN110347932B (en) | Cross-network user alignment method based on deep learning | |
Li et al. | Restricted Boltzmann machine-based approaches for link prediction in dynamic networks | |
CN110674323B (en) | Unsupervised cross-modal Hash retrieval method and system based on virtual label regression | |
WO2022252458A1 (en) | Classification model training method and apparatus, device, and medium | |
Li et al. | Image sentiment prediction based on textual descriptions with adjective noun pairs | |
CN109753589A (en) | A kind of figure method for visualizing based on figure convolutional network | |
Ma et al. | Joint multi-label learning and feature extraction for temporal link prediction | |
CN112925857A (en) | Digital information driven system and method for predicting associations based on predicate type | |
CN112131261B (en) | Community query method and device based on community network and computer equipment | |
Komkhao et al. | Incremental collaborative filtering based on Mahalanobis distance and fuzzy membership for recommender systems | |
CN114943017B (en) | Cross-modal retrieval method based on similarity zero sample hash | |
Drakopoulos et al. | Self organizing maps for cultural content delivery | |
Zhou et al. | Unsupervised multiple network alignment with multinominal gan and variational inference | |
Wang et al. | Efficient multi-modal hypergraph learning for social image classification with complex label correlations | |
CN117349494A (en) | Graph classification method, system, medium and equipment for space graph convolution neural network | |
Berton et al. | Rgcli: Robust graph that considers labeled instances for semi-supervised learning | |
Wang et al. | Link prediction in heterogeneous collaboration networks | |
CN117194771B (en) | Dynamic knowledge graph service recommendation method for graph model characterization learning | |
CN113515519A (en) | Method, device and equipment for training graph structure estimation model and storage medium | |
CN117150041A (en) | Small sample knowledge graph completion method based on reinforcement learning | |
CN116861923A (en) | Multi-view unsupervised graph contrast learning model construction method, system, computer, storage medium and application | |
Yan et al. | Unsupervised deep clustering for fashion images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21950812 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21950812 Country of ref document: EP Kind code of ref document: A1 |