WO2023000574A1 - Procédé, appareil et dispositif d'entrainement de modèle, et support de stockage lisible - Google Patents
Procédé, appareil et dispositif d'entrainement de modèle, et support de stockage lisible Download PDFInfo
- Publication number
- WO2023000574A1 WO2023000574A1 PCT/CN2021/134051 CN2021134051W WO2023000574A1 WO 2023000574 A1 WO2023000574 A1 WO 2023000574A1 CN 2021134051 W CN2021134051 W CN 2021134051W WO 2023000574 A1 WO2023000574 A1 WO 2023000574A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- matrix
- graph
- loss value
- convolutional neural
- chebyshev
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 131
- 238000000034 method Methods 0.000 title claims abstract description 58
- 239000011159 matrix material Substances 0.000 claims abstract description 212
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 147
- 230000009977 dual effect Effects 0.000 claims abstract description 26
- 238000013145 classification model Methods 0.000 claims abstract description 22
- 230000006870 function Effects 0.000 claims description 30
- 230000009466 transformation Effects 0.000 claims description 28
- 238000005295 random walk Methods 0.000 claims description 21
- 238000005070 sampling Methods 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 6
- 230000017105 transposition Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 abstract description 12
- 238000013528 artificial neural network Methods 0.000 description 8
- 239000013598 vector Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 208000035473 Communicable disease Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Definitions
- the present application relates to the field of computer technology, in particular to a model training method, device, equipment and readable storage medium.
- Graph neural network simply put, is a deep learning architecture for graph-structured data, which combines end-to-end learning with inductive reasoning, and is expected to solve problems such as causal reasoning and interpretability that traditional deep learning architectures cannot handle. series of bottlenecks.
- graph convolutional neural networks can be divided into two types based on spatial methods and spectral methods.
- the former uses the information propagation mechanism displayed on the graph, which lacks interpretability;
- the latter uses the Laplacian matrix of the graph as a tool, has a good theoretical basis, and is the mainstream direction of graph convolutional neural network research.
- the current graph convolutional neural networks based on spectral methods do not perform well when applying graph vertex classification tasks, that is, the existing graph convolutional neural network-based vertex classification models perform poorly.
- the purpose of the present application is to provide a model training method, device, device and readable storage medium to improve the performance of the vertex classification model.
- the specific plan is as follows:
- the present application provides a model training method, including:
- the random walk and sampling are performed based on the adjacency matrix to obtain a positive point-wise mutual information matrix, including:
- a random walk of a preset length is performed on each vertex in the graph data set to obtain a context path of each vertex;
- the co-occurrence probability of the vertex and the context and the corresponding edge probability are calculated, and each element in the positive point-wise mutual information matrix is determined.
- the calculating the first loss value between the first training result and the label matrix includes:
- the degree of difference in probability distribution between the first training result and the label matrix is used as the first loss value.
- said calculating a second loss value between said second training result and said first training result includes:
- the determining the target loss value based on the first loss value and the second loss value includes:
- the target loss value does not meet the preset convergence condition, updating the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the target loss value Network parameters, and iteratively training the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition;
- the updating the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the purpose loss value includes:
- the new network parameters are shared with the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network.
- both the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used to process the input data Feature transformation and graph convolution operations;
- the feature transformation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
- the graph convolution operation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
- Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation
- Hl is the input data of the lth convolutional layer of the graph convolutional neural network
- Hl+1 is the graph
- the output data of the first convolutional layer of the convolutional neural network is the transposition matrix of the feature transformation matrix that needs to be learned by the first convolutional layer of the graph convolutional neural network
- ⁇ is a nonlinear activation function
- K ⁇ n is the order of the polynomial
- n is the The number of vertices
- ⁇ k is the coefficient of the polynomial
- T k (x) 2xT k-1 (x)-T k-2 (x)
- model training device including:
- the obtaining module is used to obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set;
- a sampling module configured to perform random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix
- the first training module is used to input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;
- the second training module is used to input the vertex feature matrix and the positive point-wise mutual information matrix into the second Chebyshev graph convolutional neural network to output the second training result;
- a first calculation module configured to calculate a first loss value between the first training result and the label matrix
- a second calculation module configured to calculate a second loss value between the second training result and the first training result
- a determining module configured to determine a target loss value based on the first loss value and the second loss value
- a combination module configured to combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model if the target loss value meets a preset convergence condition .
- the present application provides a model training device, including:
- a processor is configured to execute the computer program to implement the model training method disclosed above.
- the present application provides a readable storage medium for storing a computer program, wherein when the computer program is executed by a processor, the aforementioned disclosed model training method is implemented.
- the present application provides a model training method, including: obtaining the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set; performing random walk and sampling based on the adjacency matrix to obtain the positive point-by-point Mutual information matrix; Input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result; Input the vertex feature matrix and the positive point-by-point mutual information matrix A second Chebyshev graph convolutional neural network to output a second training result; calculate a first loss value between the first training result and the label matrix; calculate the second training result and the first the second loss value between the training results; determine the target loss value based on the first loss value and the second loss value; if the target loss value meets the preset convergence condition, the first Chebyshev The graph convolutional neural network and the second Chebyshev graph convolutional neural network are combined into a dual vertex classification model.
- this application designs two Chebyshev graph convolutional neural networks, the first Chebyshev graph convolutional neural network is based on vertex feature matrix, adjacency matrix, and label matrix for supervised training, while the second Chebyshev graph convolutional neural network
- the product neural network performs unsupervised training based on the vertex feature matrix, the positive point-wise mutual information matrix and the output of the first Chebyshev graph convolutional neural network during the training process; when the target loss value determined based on the loss value of the two
- the two Chebyshev graph convolutional neural networks are combined into a dual vertex classification model to train a vertex classification model with better performance.
- This scheme can give full play to the respective advantages of supervised training and unsupervised training, and improve the performance of the vertex classification model.
- model training device, equipment and readable storage medium provided by the present application also have the above-mentioned technical effects.
- Fig. 1 is a schematic structural diagram of a graph convolutional neural network disclosed in the present application
- Fig. 2 is a flow chart of a model training method disclosed in the present application
- Fig. 3 is a schematic diagram of the data trend of a dual Chebyshev graph convolutional neural network disclosed in the present application;
- FIG. 4 is a schematic diagram of a dual Chebyshev graph convolutional neural network disclosed in the present application.
- FIG. 5 is a flow chart of a model construction and training method disclosed in the present application.
- FIG. 6 is a schematic diagram of a model training device disclosed in the present application.
- FIG. 7 is a schematic diagram of a model training device disclosed in the present application.
- V represents the set of vertices
- E represents A collection of connected edges
- V L is a subset of V
- the vertices in V L have assigned labels.
- the graph vertex classification problem solves: how to infer the label of each vertex in the set V ⁇ V L of the remaining vertices.
- a graph neural network usually consists of an input layer, one or more graph convolutional layers, and an output layer.
- graph neural networks can be divided into graph convolutional neural networks, graph recurrent neural networks, graph autoencoders, graph generative networks, and spatiotemporal graph neural networks.
- the graph convolutional neural network has attracted the attention of many researchers due to the great success of the traditional convolutional neural network in the fields of image processing and natural language understanding.
- Figure 1 shows the structure of a typical graph convolutional neural network, which consists of an input layer (Input layer), two graph convolution layers (Gconv layer), and an output layer (Output layer) composition.
- the input layer reads the n*d-dimensional vertex attribute matrix X;
- the graph convolution layer performs feature extraction on X, and passes it to the next graph convolution layer after nonlinear activation functions such as ReLu transformation;
- the output layer is the task Layer, to complete specific tasks such as vertex classification, clustering, etc.; the figure shows a vertex classification task layer, which outputs the category label Y of each vertex.
- the present application provides a model training solution that can combine supervised and unsupervised learning to effectively improve the accuracy of classification, effectively reduce the computational complexity of the network, and improve classification efficiency.
- model training method including:
- each vertex v of G has d features, and the features of all vertices constitute the n*d-dimensional vertex feature matrix X.
- the adjacency matrix of G is denoted as A, and the element A ij represents the weight of the connection edge between vertices i and j.
- an n*C-dimensional label matrix Y is constructed.
- n
- indicates the number of all vertices in the graph
- C indicates the number of label categories of all vertices
- the elements of each column corresponding to the row are set to 0.
- the Pubmed dataset contains 19,717 scientific publications in 3 categories with 44,338 citation links between publications. Publications and the links between them form a citation network, and each publication in the network uses a term frequency-inverse text frequency index (Term Frequency-Inverse Document Frequency, TF-IDF) vector to describe the feature vector, which has 500 from a dictionary of terms.
- the feature vectors of all documents form the feature matrix X.
- the goal is to classify each document, randomly sample 20 instances of each category as labeled data, use 1000 instances as test data, and use the rest as unlabeled data; construct a vertex label matrix Y. According to the citation relationship between papers, construct its adjacency matrix A.
- graph datasets can also be constructed based on proteins, graph images, etc. to classify proteins, graph images, etc.
- the adjacency matrix A based on the random walk and random sampling techniques, the positive point-wise mutual information matrix of the globally consistent information of the coding graph can be constructed.
- the adjacency matrix has two functions in random walk engineering. First, it represents the topological structure of the graph. According to it, it can be known which vertices are connected and can walk from one vertex to adjacent vertices; second. , is used to determine the probability of random walk, see formula (1) for details, a vertex may have multiple neighbors, in a random walk step, the walker can randomly pick one among all its neighbors.
- random walk and sampling are performed based on the adjacency matrix to obtain a positive point-wise mutual information matrix, including: based on the adjacency matrix, a random walk of a preset length is performed on each vertex in the graph data set to obtain Context path for each vertex; randomly sample all context paths to determine the co-occurrence times of any two vertices, and construct a vertex co-occurrence matrix; based on the vertex co-occurrence matrix, calculate the vertex and context co-occurrence probability and corresponding The marginal probability of , and determine each element in the positive pointwise mutual information matrix.
- the "co-occurrence probability of a vertex and a context” refers to: the probability pr(v i , ct j ) of a certain vertex v i appearing in a certain context ct j .
- the probability pr(v i , ct j ) of vertex v i is included in ct j .
- the marginal probability of vertex v i is equal to the sum of elements in row i in this matrix divided by the sum of all elements in this matrix.
- the marginal probability of context ct j is equal to the sum of elements in column j divided by the sum of all elements in this matrix.
- the positive point-wise mutual information matrix can be represented by P, which can encode the global consistency information of the graph, and can be determined by referring to the following content:
- the row vector pi, : is the embedded representation of the vertex v i
- the column vector p :, j is the embedded representation of the context ct j
- pi j represents the probability that the vertex v i appears in the context ct j
- the pointwise The mutual information matrix P can be obtained by random walk on the graph dataset. Specifically, consider the context ct j of vertex v j as a path ⁇ j with v j as the root node and length u, then p ij can be obtained by calculating the frequency of vertex v i appearing on the path ⁇ j .
- each vertex in the graph data set is randomly walked with a length of u steps, and the path ⁇ representing the context of the vertex can be obtained.
- Random sampling is performed on ⁇ to calculate the number of co-occurrences of any two vertices, and the vertex is obtained - context co-occurrence times matrix O (ie vertex co-occurrence times matrix).
- the element o ij represents the number of times that vertex v i appears on the context ct j , that is, the path ⁇ j with vertex v j as the root node, which can be used for subsequent calculation of p ij .
- the value of each element in the positive point-wise mutual information matrix P can be determined, thereby determining the positive point-wise mutual information matrix P.
- the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are identical, and both include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used for Perform feature transformation and graph convolution operations on the input data;
- the feature transformation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
- the graph convolution operation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
- Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation
- Hl is the input data of the lth convolutional layer of the graph convolutional neural network
- Hl+1 is the graph
- the output data of the first convolutional layer of the convolutional neural network is the transposition matrix of the feature transformation matrix that needs to be learned by the first graph convolutional layer of the graph convolutional neural network
- ⁇ is the nonlinear activation function
- K ⁇ n is the order of the polynomial
- n is the number of vertices in the graph dataset Number
- ⁇ k is the coefficient of polynomial
- T k (x) 2xT k-1 (x)-T k-2 (x)
- ⁇ max is The largest eigenvalue in , I n is an n*n-dimensional identity matrix.
- calculating the first loss value between the first training result and the label matrix includes: based on the cross-entropy principle, using the difference degree of the probability distribution between the first training result and the label matrix as the first loss value (i.e. supervised loss).
- calculating the second loss value between the second training result and the first training result includes: calculating the difference between elements with the same coordinates in the second training result and the first training result, and The sum of squares of all differences is used as the second loss value (i.e. unsupervised loss).
- the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, And perform iterative training on the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition.
- updating the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the purpose loss value includes: updating the first Chebyshev graph convolutional neural network according to the purpose loss value After network parameters, share the updated network parameters to the second Chebyshev graph convolutional neural network; or update the network parameters of the second Chebyshev graph convolutional neural network according to the target loss value, and update the updated The network parameters are shared to the first Chebyshev graph convolutional neural network; or after the new network parameters are calculated according to the target loss value, the new network parameters are shared to the first Chebyshev graph convolutional neural network and the second Chebyshev graph Convolutional neural network.
- the first Chebyshev graph convolutional neural network performs supervised training based on the vertex feature matrix, adjacency matrix, and label matrix
- the second Chebyshev graph convolutional neural network performs unsupervised training based on the vertex feature matrix, the positive point-wise mutual information matrix and the output of the first Chebyshev graph convolutional neural network during the training process; when the target loss determined based on the loss values of the two
- the two Chebyshev graph convolutional neural networks are combined into a dual vertex classification model, and a vertex classification model with better performance is trained.
- This scheme can give full play to the respective advantages of supervised training and unsupervised training, and improves the performance of the vertex classification model.
- the dual vertex classification model can also be called a dual Chebyshev graph convolutional neural network (DCGCN, Dual Chebyshev Graph Convolutional Neural Network).
- DCGCN Dual Chebyshev graph convolutional neural network
- the dual Chebyshev graph convolutional neural network includes two identical Chebyshev graph convolutional neural networks ChebyNet with shared parameters, and each ChebyNet consists of an input layer, L graph convolutional layers and an output layer.
- ChebyNet A takes the adjacency matrix A and vertex feature matrix X of encoding graph local consistency information as input data, and outputs the vertex category label prediction matrix Z A ;
- ChebyNet P uses the positive point-wise mutual information matrix P and vertex feature encoding graph global consistency information The feature matrix X is used as input data, and the vertex category label prediction matrix Z P is output.
- ChebyNet A performs supervised learning based on some labeled graph vertices, and the prediction accuracy is high; under the guidance of the former (using its prediction result Z A ), ChebyNet P uses unlabeled graph vertices for unsupervised learning to improve Prediction accuracy for better vertex classification models.
- Z A and Z P are consistent or the difference is negligible, so Z A or Z P can be used as the output of the dual Chebyshev graph convolutional neural network.
- Figure 4 illustrates the structure of a dual Chebyshev graph convolutional neural network.
- the convolutional layer in Figure 4 is the graph convolutional layer described below.
- the input layer is mainly responsible for reading the graph data to be classified, including the vertex feature matrix X, the adjacency matrix A representing the topology of the graph, and the positive point-by-point mutual information matrix P that encodes the global consistency information of the graph.
- the graph convolution operation formula is:
- Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation
- Hl is the input data of the lth convolutional layer of the graph convolutional neural network
- Hl+1 is the graph
- the output data of the first convolutional layer of the convolutional neural network is the transposition matrix of the feature transformation matrix that needs to be learned by the first graph convolutional layer of the graph convolutional neural network
- ⁇ is the nonlinear activation function
- K ⁇ n is the order of the polynomial
- n is the number of vertices in the graph dataset Number
- ⁇ k is the coefficient of polynomial
- T k (x) 2xT k-1 (x)-T k-2 (x)
- H1 is the vertex feature matrix X.
- ⁇ max is The largest eigenvalue in , I n is an n*n-dimensional identity matrix.
- U is given by the Laplacian matrix of the graph G A matrix composed of eigenvectors obtained by eigendecomposition; U -1 is the inverse matrix of U; ⁇ is a diagonal matrix of eigenvalues, and the elements on the diagonal are ⁇ 1 , ⁇ 2 ,..., ⁇ n .
- ⁇ k represents the order of the polynomial, which can limit the information to propagate at most K steps at each vertex. Therefore, only K+1 parameters are required, which greatly reduces the complexity of the model training process. due to the formula The calculation of the convolution kernel matrix involves the eigendecomposition of the graph Laplacian matrix, which is computationally expensive. Therefore, on this basis, the present embodiment uses the Chebyshev polynomials to design an approximate calculation scheme, and Approximately:
- the loss function of the dual Chebyshev graph convolutional neural network consists of two parts: the supervised learning loss ls S with labeled vertices and the unsupervised learning loss ls U for unlabeled vertices.
- ChebyNet A takes the adjacency matrix A and the vertex feature matrix X as input for supervised learning, and compares the vertex label prediction result Z A with the known vertex label matrix Y to calculate the supervised learning loss.
- ChebyNet P takes the positive point-wise mutual information matrix and vertex feature matrix X as input for unsupervised learning, and compares its prediction result Z P with ChebyNet A 's prediction result Z A to calculate the unsupervised learning loss.
- the loss function of the dual Chebyshev graph convolutional neural network can be expressed as: Among them, ⁇ is a constant used to adjust the proportion of unsupervised learning loss in the entire loss function.
- the supervised learning loss function calculates the degree of difference between the actual label probability distribution and the predicted label probability distribution of the vertex based on the principle of cross entropy; the unsupervised learning loss function calculates the sum of squares of the difference between the same coordinate elements of Z P and Z A.
- the initialization strategy of network parameters can choose normal distribution random initialization, Xavier initialization or He Initialization initialization, etc.
- Network parameters include feature transformation matrix ⁇ l and convolution kernel F l .
- the network parameters can be corrected and updated according to stochastic gradient descent (StochasticGradientDescent, SGD), momentum gradient descent (MomentumGradientDescent, MGD), NesterovMomentum, AdaGrad, RMSprop and Adam (AdaptiveMomentEstimation) or batch gradient descent (BatchGradientDescent, BGD), etc., to Optimize the loss function value.
- stochastic gradient descent StochasticGradientDescent, SGD
- momentum gradient descent MomentumGradientDescent, MGD
- NesterovMomentum AdaGrad
- AdaGrad AdaGrad
- RMSprop and Adam AdaptiveMomentEstimation
- BGD batch gradient descent
- the training process of the dual Chebyshev graph convolutional neural network can be carried out with reference to Figure 5, specifically including: for the graph data set G, construct the vertex feature matrix X , the positive point-by-point mutual information matrix P of the global consistency information of the encoded graph, the adjacency matrix A of the local consistency information of the encoded graph, and the vertex label matrix Y; the vertex feature matrix X and the adjacency matrix A are input into ChebyNet A , and the positive point-by-point mutual information The information matrix P and vertex feature matrix X are input into ChebyNet P , and the network parameters are updated according to the above loss function to train ChebyNet A and ChebyNet P.
- the training ends and a dual Chebyshev graph convolutional neural network is obtained.
- the class j it should belong to can be obtained according to the vertex label matrix Y.
- the output feature matrix of each layer is calculated; according to the definition of the output layer, the probability Z j (1 ⁇ j ⁇ C), and calculate the loss function value according to the loss function defined above; for an unlabeled vertex v i ⁇ V U , take the category with the highest probability as the latest category of the vertex to update the vertex label matrix Y.
- the dual Chebyshev graph convolutional neural network is composed of two Chebyshev graph convolutional neural networks with the same structure and shared parameters.
- the two perform supervised learning and unsupervised learning respectively, which can improve the network
- the convergence rate and prediction accuracy at the same time, the graph convolution layer is defined based on the graph Fourier transform, and the graph convolution operation is divided into two stages of feature transformation and graph convolution, which can reduce the amount of network parameters;
- the graph convolution kernel is defined as a polynomial convolution kernel, which ensures the locality of the graph convolution calculation; in order to reduce the computational complexity, the Chebyshev polynomial is used to approximate the graph convolution.
- this embodiment provides a training method for a dual Chebyshev graph convolutional neural network, which can solve the problem of vertex classification.
- graph modeling is performed on the collected data set to obtain its adjacency matrix and vertex feature matrix; based on the adjacency matrix, for each vertex, a random walk of a specific length is carried out on the graph, and the resulting walk is
- Sequence sampling obtains a positive point-by-point mutual information matrix, which represents the context information of vertices; defines the convolution operation according to the spectral graph theory, constructs the graph convolution layer for feature extraction and the output layer for vertex classification tasks, builds and trains Chebyshev graph convolutional neural network; at the end of training, classification predictions for unlabeled vertices in the graph are available.
- this method can learn more graph topology information, including the local consistency and global consistency of each vertex, due to the design strategy of the dual graph convolutional neural network.
- the characteristic information greatly improves the learning ability of the model; and, at the same time, using the graph topology and attribute characteristics of vertices, combined with supervised and unsupervised learning, effectively improves the accuracy of classification; with the help of Chebyshev polynomials to approximate the calculation of graph convolution, Avoiding the expensive matrix eigendecomposition operation effectively reduces the computational complexity of the network and improves the classification efficiency of the network.
- a model training device provided in the embodiment of the present application is introduced below, and a model training device described below and a model training method described above may refer to each other.
- model training device including:
- Obtaining module 601 used to obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set;
- the sampling module 602 is used to perform random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix;
- the first training module 603 is used to input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;
- the second training module 604 is used to input the vertex feature matrix and the positive point-by-point mutual information matrix into the second Chebyshev graph convolutional neural network to output the second training result;
- the first calculation module 605 is used to calculate the first loss value between the first training result and the label matrix
- a second calculation module 606, configured to calculate a second loss value between the second training result and the first training result
- a determining module 607 configured to determine a target loss value based on the first loss value and the second loss value
- the combination module 608 is configured to combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model if the target loss value meets the preset convergence condition.
- sampling module is specifically used for:
- a random walk of preset length is performed on each vertex in the graph dataset to obtain the context path of each vertex;
- the vertex and context co-occurrence probability and the corresponding edge probability are calculated, and each element in the positive point-wise mutual information matrix is determined.
- the first calculation module is specifically used for:
- the degree of difference in probability distribution between the first training result and the label matrix is used as the first loss value.
- the second calculation module is specifically used for:
- the determination module is specifically used for:
- the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, And perform iterative training on the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition;
- the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, including:
- the updated network parameters are shared to the second Chebyshev graph convolutional neural network;
- the updated network parameters are shared to the first Chebyshev graph convolutional neural network;
- the new network parameters are shared to the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network.
- both the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used to process the input data Perform feature transformation and graph convolution operations;
- the feature transformation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
- the graph convolution operation formula of the lth (1 ⁇ l ⁇ L) layer graph convolution layer is:
- Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation
- Hl is the input data of the lth convolutional layer of the graph convolutional neural network
- Hl+1 is the graph
- the output data of the first convolutional layer of the convolutional neural network is the transposition matrix of the feature transformation matrix that needs to be learned by the first graph convolutional layer of the graph convolutional neural network
- ⁇ is the nonlinear activation function
- K ⁇ n is the order of the polynomial
- n is the number of vertices in the graph dataset Number
- ⁇ k is the coefficient of polynomial
- T k (x) 2xT k-1 (x)-T k-2 (x)
- this embodiment provides a model training device, which can give full play to the respective advantages of supervised training and unsupervised training, and improve the performance of the vertex classification model.
- model training device provided in the embodiment of the present application, and the model training device described below and the model training method and device described above may refer to each other.
- model training device including:
- Memory 701 used to store computer programs
- the processor 702 is configured to execute the computer program, so as to implement the method disclosed in any of the foregoing embodiments.
- a readable storage medium provided by an embodiment of the present application is introduced below.
- the readable storage medium described below and the model training method, device, and equipment described above may refer to each other.
- a readable storage medium is used to store a computer program, wherein the computer program implements the model training method disclosed in the foregoing embodiments when executed by a processor.
- the computer program implements the model training method disclosed in the foregoing embodiments when executed by a processor.
- the specific steps of the method reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.
- RAM random access memory
- ROM read-only memory
- EEPROM electrically programmable ROM
- EEPROM electrically erasable programmable ROM
- registers hard disk, removable disk, CD-ROM, or any other Any other known readable storage medium.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention concerne un procédé, un appareil et un dispositif d'entrainement de modèle, et un support de stockage lisible. Au moyen du procédé, deux réseaux de neurones convolutifs de graphe de Chebyshev sont conçus, l'un effectuant un entrainement supervisé sur la base d'une matrice de caractéristiques de sommet, d'une matrice d'adjacence et d'une matrice d'étiquettes, et l'autre effectuant un entrainement non supervisé sur la base de la matrice de caractéristiques de sommet, d'une matrice d'informations mutuelles à points positifs et d'une sortie du réseau précédent pendant le processus d'apprentissage ; et lorsqu'une valeur de perte cible déterminée sur la base de valeurs de perte des deux réseaux de neurones convolutifs de graphe de Chebyshev remplit une condition de convergence prédéfinie, les deux réseaux de neurones convolutifs de graphe de Chebyshev sont combinés en un modèle de classification de sommets double pour obtenir un modèle de classification de sommets avec une meilleure performance au moyen d'un entrainement. Au moyen du procédé, des avantages respectifs d'un entrainement supervisé et d'un entrainement non supervisé peuvent être exploités, permettant ainsi d'améliorer les performances d'un modèle de classification de sommets.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110825194.9A CN113705772A (zh) | 2021-07-21 | 2021-07-21 | 一种模型训练方法、装置、设备及可读存储介质 |
CN202110825194.9 | 2021-07-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023000574A1 true WO2023000574A1 (fr) | 2023-01-26 |
Family
ID=78650163
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/134051 WO2023000574A1 (fr) | 2021-07-21 | 2021-11-29 | Procédé, appareil et dispositif d'entrainement de modèle, et support de stockage lisible |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113705772A (fr) |
WO (1) | WO2023000574A1 (fr) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112364372A (zh) * | 2020-10-27 | 2021-02-12 | 重庆大学 | 一种有监督矩阵补全的隐私保护方法 |
CN116109195A (zh) * | 2023-02-23 | 2023-05-12 | 深圳市迪博企业风险管理技术有限公司 | 一种基于图卷积神经网络的绩效评估方法及系统 |
CN116129206A (zh) * | 2023-04-14 | 2023-05-16 | 吉林大学 | 图像解耦表征学习的处理方法、装置及电子设备 |
CN116405100A (zh) * | 2023-05-29 | 2023-07-07 | 武汉能钠智能装备技术股份有限公司 | 一种基于先验知识的失真信号还原方法 |
CN117351239A (zh) * | 2023-10-11 | 2024-01-05 | 兰州交通大学 | 一种图卷积自编码器支持下的多尺度道路网相似性计算方法 |
CN117391150A (zh) * | 2023-12-07 | 2024-01-12 | 之江实验室 | 一种基于分层池化图哈希的图数据检索模型训练方法 |
CN117540828A (zh) * | 2024-01-10 | 2024-02-09 | 中国电子科技集团公司第十五研究所 | 作训科目推荐模型训练方法、装置、电子设备和存储介质 |
CN117909903A (zh) * | 2024-01-26 | 2024-04-19 | 深圳硅山技术有限公司 | 电动助力转向系统的诊断方法、装置、设备及存储介质 |
CN117971356A (zh) * | 2024-03-29 | 2024-05-03 | 苏州元脑智能科技有限公司 | 基于半监督学习的异构加速方法、装置、设备及存储介质 |
CN118035811A (zh) * | 2024-04-18 | 2024-05-14 | 中科南京信息高铁研究院 | 基于图卷积神经网络的用电设备状态感知方法、控制服务器及介质 |
CN118391723A (zh) * | 2024-07-01 | 2024-07-26 | 青岛能源设计研究院有限公司 | 一种智能空气源热泵供热系统 |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113705772A (zh) * | 2021-07-21 | 2021-11-26 | 浪潮(北京)电子信息产业有限公司 | 一种模型训练方法、装置、设备及可读存储介质 |
CN114360007B (zh) * | 2021-12-22 | 2023-02-07 | 浙江大华技术股份有限公司 | 人脸识别模型训练、人脸识别方法、装置、设备及介质 |
CN114528994B (zh) * | 2022-03-17 | 2024-10-18 | 腾讯科技(深圳)有限公司 | 一种识别模型的确定方法和相关装置 |
CN114707641B (zh) * | 2022-03-23 | 2024-11-08 | 平安科技(深圳)有限公司 | 双视角图神经网络模型的训练方法、装置、设备及介质 |
CN114490950B (zh) * | 2022-04-07 | 2022-07-12 | 联通(广东)产业互联网有限公司 | 编码器模型的训练方法及存储介质、相似度预测方法及系统 |
CN114943324B (zh) * | 2022-05-26 | 2023-10-13 | 中国科学院深圳先进技术研究院 | 神经网络训练方法、人体运动识别方法及设备、存储介质 |
CN115858725B (zh) * | 2022-11-22 | 2023-07-04 | 广西壮族自治区通信产业服务有限公司技术服务分公司 | 一种基于无监督式图神经网络的文本噪声筛选方法及系统 |
CN116071635A (zh) * | 2023-03-06 | 2023-05-05 | 之江实验室 | 基于结构性知识传播的图像识别方法与装置 |
CN116089652B (zh) * | 2023-04-07 | 2023-07-18 | 中国科学院自动化研究所 | 视觉检索模型的无监督训练方法、装置和电子设备 |
CN116402554B (zh) * | 2023-06-07 | 2023-08-11 | 江西时刻互动科技股份有限公司 | 一种广告点击率预测方法、系统、计算机及可读存储介质 |
CN116431816B (zh) * | 2023-06-13 | 2023-09-19 | 浪潮电子信息产业股份有限公司 | 一种文献分类方法、装置、设备和计算机可读存储介质 |
CN118552136B (zh) * | 2024-07-26 | 2024-10-25 | 浪潮智慧供应链科技(山东)有限公司 | 基于大数据的供应链智能库存管理系统及方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200285944A1 (en) * | 2019-03-08 | 2020-09-10 | Adobe Inc. | Graph convolutional networks with motif-based attention |
CN112464057A (zh) * | 2020-11-18 | 2021-03-09 | 苏州浪潮智能科技有限公司 | 一种网络数据分类方法、装置、设备及可读存储介质 |
CN112925909A (zh) * | 2021-02-24 | 2021-06-08 | 中国科学院地理科学与资源研究所 | 一种考虑局部不变性约束的图卷积文献分类方法及系统 |
CN113705772A (zh) * | 2021-07-21 | 2021-11-26 | 浪潮(北京)电子信息产业有限公司 | 一种模型训练方法、装置、设备及可读存储介质 |
-
2021
- 2021-07-21 CN CN202110825194.9A patent/CN113705772A/zh active Pending
- 2021-11-29 WO PCT/CN2021/134051 patent/WO2023000574A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200285944A1 (en) * | 2019-03-08 | 2020-09-10 | Adobe Inc. | Graph convolutional networks with motif-based attention |
CN112464057A (zh) * | 2020-11-18 | 2021-03-09 | 苏州浪潮智能科技有限公司 | 一种网络数据分类方法、装置、设备及可读存储介质 |
CN112925909A (zh) * | 2021-02-24 | 2021-06-08 | 中国科学院地理科学与资源研究所 | 一种考虑局部不变性约束的图卷积文献分类方法及系统 |
CN113705772A (zh) * | 2021-07-21 | 2021-11-26 | 浪潮(北京)电子信息产业有限公司 | 一种模型训练方法、装置、设备及可读存储介质 |
Non-Patent Citations (1)
Title |
---|
ZHUANG CHENYI ZHUANGCHENYI@GMAIL.COM; MA QIANG QIANG@I.KYOTO-U.AC.JP: "Dual Graph Convolutional Networks for Graph-Based Semi-Supervised Classification", THE WEB CONFERENCE 2018, INTERNATIONAL WORLD WIDE WEB CONFERENCES STEERING COMMITTEE, REPUBLIC AND CANTON OF GENEVASWITZERLAND, 23 April 2018 (2018-04-23) - 27 April 2018 (2018-04-27), Republic and Canton of GenevaSwitzerland , pages 499 - 508, XP058652837, ISBN: 978-1-4503-5640-4, DOI: 10.1145/3178876.3186116 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112364372A (zh) * | 2020-10-27 | 2021-02-12 | 重庆大学 | 一种有监督矩阵补全的隐私保护方法 |
CN116109195A (zh) * | 2023-02-23 | 2023-05-12 | 深圳市迪博企业风险管理技术有限公司 | 一种基于图卷积神经网络的绩效评估方法及系统 |
CN116129206A (zh) * | 2023-04-14 | 2023-05-16 | 吉林大学 | 图像解耦表征学习的处理方法、装置及电子设备 |
CN116405100A (zh) * | 2023-05-29 | 2023-07-07 | 武汉能钠智能装备技术股份有限公司 | 一种基于先验知识的失真信号还原方法 |
CN116405100B (zh) * | 2023-05-29 | 2023-08-22 | 武汉能钠智能装备技术股份有限公司 | 一种基于先验知识的失真信号还原方法 |
CN117351239A (zh) * | 2023-10-11 | 2024-01-05 | 兰州交通大学 | 一种图卷积自编码器支持下的多尺度道路网相似性计算方法 |
CN117391150A (zh) * | 2023-12-07 | 2024-01-12 | 之江实验室 | 一种基于分层池化图哈希的图数据检索模型训练方法 |
CN117391150B (zh) * | 2023-12-07 | 2024-03-12 | 之江实验室 | 一种基于分层池化图哈希的图数据检索模型训练方法 |
CN117540828A (zh) * | 2024-01-10 | 2024-02-09 | 中国电子科技集团公司第十五研究所 | 作训科目推荐模型训练方法、装置、电子设备和存储介质 |
CN117540828B (zh) * | 2024-01-10 | 2024-06-04 | 中国电子科技集团公司第十五研究所 | 作训科目推荐模型训练方法、装置、电子设备和存储介质 |
CN117909903A (zh) * | 2024-01-26 | 2024-04-19 | 深圳硅山技术有限公司 | 电动助力转向系统的诊断方法、装置、设备及存储介质 |
CN117971356A (zh) * | 2024-03-29 | 2024-05-03 | 苏州元脑智能科技有限公司 | 基于半监督学习的异构加速方法、装置、设备及存储介质 |
CN118035811A (zh) * | 2024-04-18 | 2024-05-14 | 中科南京信息高铁研究院 | 基于图卷积神经网络的用电设备状态感知方法、控制服务器及介质 |
CN118391723A (zh) * | 2024-07-01 | 2024-07-26 | 青岛能源设计研究院有限公司 | 一种智能空气源热泵供热系统 |
Also Published As
Publication number | Publication date |
---|---|
CN113705772A (zh) | 2021-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023000574A1 (fr) | Procédé, appareil et dispositif d'entrainement de modèle, et support de stockage lisible | |
CN114048331A (zh) | 一种基于改进型kgat模型的知识图谱推荐方法及系统 | |
Bhagat et al. | Node classification in social networks | |
CN110347932B (zh) | 一种基于深度学习的跨网络用户对齐方法 | |
Li et al. | Restricted Boltzmann machine-based approaches for link prediction in dynamic networks | |
CN110674323B (zh) | 基于虚拟标签回归的无监督跨模态哈希检索方法及系统 | |
WO2022252458A1 (fr) | Procédé et appareil de formation de modèle de classification, dispositif et support | |
Li et al. | Image sentiment prediction based on textual descriptions with adjective noun pairs | |
CN109753589A (zh) | 一种基于图卷积网络的图可视化方法 | |
Ma et al. | Joint multi-label learning and feature extraction for temporal link prediction | |
CN112925857A (zh) | 基于谓语类型预测关联的数字信息驱动的系统和方法 | |
CN112131261B (zh) | 基于社区网络的社区查询方法、装置和计算机设备 | |
Komkhao et al. | Incremental collaborative filtering based on Mahalanobis distance and fuzzy membership for recommender systems | |
CN114943017B (zh) | 一种基于相似性零样本哈希的跨模态检索方法 | |
Drakopoulos et al. | Self organizing maps for cultural content delivery | |
Zhou et al. | Unsupervised multiple network alignment with multinominal gan and variational inference | |
Wang et al. | Efficient multi-modal hypergraph learning for social image classification with complex label correlations | |
CN117349494A (zh) | 空间图卷积神经网络的图分类方法、系统、介质及设备 | |
Berton et al. | Rgcli: Robust graph that considers labeled instances for semi-supervised learning | |
Wang et al. | Link prediction in heterogeneous collaboration networks | |
CN117194771B (zh) | 一种图模型表征学习的动态知识图谱服务推荐方法 | |
CN113515519A (zh) | 图结构估计模型的训练方法、装置、设备及存储介质 | |
CN117150041A (zh) | 一种基于强化学习的小样本知识图谱补全方法 | |
CN116861923A (zh) | 多视图无监督图对比学习模型构建方法、系统、计算机、存储介质及应用 | |
Yan et al. | Unsupervised deep clustering for fashion images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21950812 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21950812 Country of ref document: EP Kind code of ref document: A1 |