To overcome the limitations of prior clustering methods, researchers build the end-to-end architecture for clustering and coarsening graphs, that is, trainable clustering.
Graph Multiset Transformer (GMT) [
142] uses a multi-head self-attention mechanism to cluster nodes into different sets according to the final task, and the graph-level representation is therefore derived through these sets. MinCutPool [
143] assigns each node to a cluster via an MLP, which is optimized by two goals: first, that the clusters are similar in size, and, second, that the clusters’ embeddings are separable. Finally, the graph-level representation is obtained by pooling the substructure-level embeddings. The above trainable clustering methods follow the fashion of hard assignment, yet, DiffPool [
144] assigns each node to multiple clusters through a trainable soft assignment matrix
\(\mathbf {S}^{(k)} \in \mathbb {R}^{n^{(k)} \times n^{(k+1)}}\), where
\(n^{(k)}\) is the number of vertices in the input graph at the
\(k\)th layer, and
\(n^{(k+1)}\) represents the cluster’s number in the input graph or the node’s number in the coarsened graph. To be specific, at the
\(k\)th layer, each row of
\(\mathbf {S}^{(k)}\) corresponds to a node in the input graph, and each column of
\(\mathbf {S}^{(k)}\) corresponds to a new node in the coarsened graph (i.e., a cluster in the input graph). The assignment matrix
\(\mathbf {S}^{(k)}\) is trained by a graph convolutional layer, which is defined as:
where
\(\mathbf {A}^{(k)} \in \mathbb {R}^{n^{(k)} \times n^{(k)}}\) and
\(\mathbf {H}^{(k)} \in \mathbb {R}^{n^{(k)} \times f}\) are the adjacent matrix and node embedding matrix of the input graph at the
\(k\)th layer, respectively.
\(\mathbf {W}^{(k)} \in \mathbb {R}^{f \times n^{(k+1)}}\) is the trainable weight matrix, and
\(\text{softmax}\left(\cdot \right)\) function is applied to each row. In addition, StructPool [
145] supports both hard and soft assignment clustering by training assignment matrix via softmax and argmax functions, respectively.