To allocate the edges on the graph
\(\mathcal {G}_{u-i}\) to different interests under each behavior, we apply the disentangled representation learning [
7,
24,
41] for behavior allocation. We firstly partition the provided multi-behavior user-item graph
\(\mathcal {G}_{u-i}\) into behavior-specific sub-graphs
\(\mathcal {G}_{u-i}^{1}, \mathcal {G}_{u-i}^{2}, \ldots ,\mathcal {G}_{u-i}^{K}\), and the corresponding adjacency matrices are
\(\mathbf {A}_{u-i}^{1},\mathbf {A}_{u-i}^{2}, \ldots ,\mathbf {A}_{u-i}^{K}\), which can be formulated as
where
\(\mathbf {Y}_{u-i}^{k}\) is the user-item adjacency interaction matrix of behavior
\(k\),
\(\mathbf {A}_{u-i}^{k} \in \mathbb {R}^{(M+N)\times (M+N)}\),
\(M\) and
\(N\) denote the number of users and items, respectively. As for the processing of time, we simply follow KHGT [
45], and first consider the edge
\(\mathcal {E}_{u-i}^k\) between user
\(u\) and item
\(i\) under behavior
\(k\), mapping their corresponding interaction timestamp
\(t_{u-i}^k\) into the time slot as
\(\tau (t_{u-i}^k)\), then generate the embedding of time as
\(\mathbf {t}_{u-i}^{k} \in \mathbb {R}^{1\times d^{*}}\) for each interaction. Specifically, we have:
where the element indexs (even and odd position index) in the temporal information embedding are represented as
\(2n\) and
\(2n+1\), respectively.
\(\mathbf {W}_{t} \in \mathbb {R}^{2d\times d^{*}}\) is the transformation weights corresponding to
\(k\)-
\(th\) type of interactions.
To better illustrate the process of the allocation of interests on each layer, we take the
\(k\)\(th\) behavior as an example. As shown in Algorithm 1, we use
\(\mathcal {E}_{u-i}^{k} = \lbrace (p,q)|\mathbf {A}_{u-i}^{k}[p,q] \ne 0 \rbrace\) to represent the set of edges on graph
\(\mathcal {G}_{u-i}^{k}\). Meanwhile, we set
\(\mathbf {a}_{0}\) as the initial weight for each edge on
\(\mathcal {G}_{u-i}^{k}\) and initialize the embedding for each user and item. We leverage a kronecker product
\((\otimes)\) to replicate the vector
\(\mathbf {t}_{u-i}^k\) for
\(N_*\) times along the row direction and add it to
\(\mathbf {e}_{u}^{k}\) and
\(\mathbf {e}_{i}^{k}\), thus get
\(\mathbf {f}_{u,0}^{k}\) and
\(\mathbf {f}_{i,0}^{k}\) (Step 1). Here, for simplicity, we denote
\(\mathbf {e}_{u}^{k}\) and
\(\mathbf {e}_{i}^{k}\) as the output of the previous layer. Next, we start iterative process. In the
\(t\)\(th\) iteration, in order to get distributions across all interests, we use the softmax function to normalize these coefficients
(Step 2):
where
\(\mathbf {a}_{t}^{k}\) denotes the vector of weight coefficients of each edge of graph
\(\mathcal {G}_{u-i}^{k}\) in the
\(t\)\(th\) iteration,
\(\tau\) is the temperature coefficient,
\(s\) denotes the
\(s\)\(th\) interest. Furthermore, in each iteration, we assign all the edges on the graph
\(\mathcal {G}_{u-i}^{k}\) to each interest of users and items on the graph
(Step 3). At this step,
\(\mathbf {f}_{u,t}^{k}[s]\) and
\(\mathbf {f}_{i,t}^{k}[s]\) represent the
\(s\)\(th\) interest for user
\(u\) and item
\(i\) after the allocation of the edge weights, respectively. Last but not least, we calculate the affinity between each pairs of nodes on the graph
\(\mathcal {G}_{u-i}^{k}\) to update the weight of each edge
(Step 4). Here,
\(\mathbf {a}_{t}^{k}[s]\) denotes the updated weight of edges at the
\(t\)\(th\) iteration of the
\(s\)\(th\) interest under behavior
\(k\). After all of the iterations, we finally take the representation generated by the last iteration as the final output, and aggregate them with GCN models
(Step 5), which is the same as the aggregators in Section
4.2.1.