Keywords

1 Introduction

With the development of mobile Internet, location-based social networks (LBSNs) [6], such as Yelp and Foursquare, have emerged in recent years. In LBSNs, users can share their experiences and tips for Point-of-Interests (POIs) [2], e.g., restaurants and sightseeing sites, in the form of check-ins [14]. The rapid growth of LBSNs has attracted billions of users, promoting our urban experience to a new stage [17]. To benefit LBSN users and promote location-based marketing, POI recommendation on LBSNs has become an essential task aiming to recommend new POIs to a user according to his personal preferences and to facilitate his exploration of the city [18].

Different from traditional item recommendation, e.g., movie recommendation, POI recommendation is highly context-dependent. First, locations of POIs are important factors for POI recommendations, since users prefer to visit POIs that are not far away. For example, in Gowalla and Foursquare, 90% of users’ consecutive check-ins are within the distance less than 50km [15]. Second, users’ preference over POIs exhibit salient temporal periodic patterns, e.g., restaurants are preferred in dinner time and theaters are visited more frequently in weekend than in weekdays. Finally, a user’s check-in records form a POI sequence with continuous distance and time intervals, offering various sequential patterns [12], e.g., POIs planned in a trip. In sum, users’ check-in records contain rich spatial-temporal context information that should be synthetically integrated to reflect the dynamics of the underlying check-in system.

Many methods have been proposed to exploit geographical influence [3, 11, 18, 20], temporal periodic patterns [5, 22, 25] and sequential dependency [1, 4, 12, 13, 23, 24] lying in users’ check-ins for improving the performance of POI recommendation. However, these methods only partially leverage the spatial context and temporal context, and they integrate these contexts using a global weighting scheme [9, 19, 25], assuming that the importance of each context is unchanged for all POIs. These methods fail to capture the dynamic role of context, given that the importance of each context is POI-specific. Let’s say we are going to make POI recommendation to a big foodie at 3 p.m. We may push a restaurant to him considering his long-term preference as a foodie. However, recommending a restaurant must happen at the right time, e.g., dinner time, rather than 3 p.m, at which restaurant cannot meet the user’s current needs. In this scenario where restaurant is a candidate POI, we should inhibit the importance of user’s long-term preference and let the temporal context play a more decisive role in user’s check-in choice. That is to say, each context’s importance to current recommendation should be POI-specific.

In this paper, we propose to model the spatial-temporal context for POI recommendation using an attention mechanism (Fig. 1). Temporal context is represented as a low-dimensional vector, capturing what type of POI is preferred for a specific temporal moment. For spatial context, due to the lack of location information of user, we infer the spatial context by exploiting users’ check-in records, obtaining a low-dimensional representation through an attention enhanced recurrent neural network. Finally, a POI-guided attention mechanism is adopted to learn the importance of each context for each recommendation, offering high flexibility to capture the dynamic nature of context.

We evaluate the proposed POI recommendation method by extensive experiments on two real-world datasets collected from Foursquare and Gowalla respectively. Experimental results show that the POI-specific context importance can significantly improve the performance of POI recommendation, compared with state-of-the-art methods.

2 Related Work

In this section, we make a brief discussion on related works.

Check-in is the main inferential evidence for POI recommendation [3, 20]. Due to its nature as implicit feedback, researchers utilize the weighted matrix factorization [10] or pairwise ranking methods [7, 9] to model it.

Geographical Influence has been proved to be effective in improving POI recommendation accuracy, with a parameterized distribution to model the distance influence [3, 10, 16, 20]. Moreover, Wang et al. [18] propose to model POI-specific geographical influence, which captures the asymmetry and high variation of geographical influence between POIs.

Periodic Pattern has attracted much attention from researchers. They split one day into multiple time slots and exploit the check-in pattern in each time slot in terms of temporal non-uniformness and consecutiveness [5, 22].

Sequential Dependency has been exploited in recent years. Many methods based on matrix factorization [13], Markov models. [4, 23, 24], word2vec [14] or RNN [12] have been proposed to learn the transitive patterns between POIs.

However, the above methods only partially exploit the spatial-temporal context and integrate these contexts using a global weighting scheme [9, 19, 25] given that the importance of each context is POI-specific. Considering that the attention mechanism can automatically model and select pertinent piece of information from a set of inputs and achieve good performance in many neural network-based tasks [8]. In this paper, we design a POI-guided attention mechanism to address the above issue.

3 Problem Formulation

For ease of presentation, we first introduce the notations used in this paper. We denote U and V the user set and POI set, with u and i representing a user and a POI respectively. Each POI i’s location is denoted by its longitude and latitude, i.e., \((lon_i, lat_i)\). A check-in is a triple (uit), which means user u visits POI i at time t. For each user u, a check-in profile \(D_u\) is provided, which is the set of check-ins generated by u in chronological order. We define a long-term preference vector \(\varvec{P}_u\) for each user u, a preference vector \(\varvec{S}_i\) and an influence vector \(\varvec{I}_i\) for each POI i, and a preference vector \(\varvec{E}_t\) for each time t.

POI Recommendation: given all users’ check-in profiles \(\{D_u\}\), we aim to provide a list of POIs which are not visited and potentially preferred by a target user u at a target time t.

4 Model

In this section, we describe the proposed framework for POI recommendation, which models the Spatial-Temporal context by an Attention enhanced Recurrent neural network (STAR in Fig. 1).

4.1 Model Architecture

Whether a POI will be chosen is sensitive to the spatial-temporal context, e.g., restaurant is popular at dinner time and the temporal context plays a more decisive role, compared with other factors in this scenario. Likewise, if user is on a food street, restaurant is also popular and the spatial context becomes more decisive. Therefore, we need a model that can automatically adjust the importance of the involved context to user’s current preference, and the idea comes to fruition in the proposed STAR model by using a POI-guided attention mechanism.

STAR infers a target user u’s preference to a candidate POI j at a target time t by capturing the dynamic influence of the spatial context and the temporal context, besides user u’s long-term preference. To characterize the influence of each factor, we learn the following latent vectors in STAR:

  • \(\varvec{P}_u\): a n-dimension vector characterizes user u’s long-term preference.

  • \(\varvec{S}_j\): a n-dimension vector characterizes POI j’s preference distribution.

  • \(\varvec{E}_t\): a n-dimension vector characterizes time t’s temporal periodic patterns.

Specifically, \(\varvec{E}_t\) contains two types of temporal information, including hour of a day, and day of a week, to capture which POIs are preferred at a specific time.

Figure 1(a) shows the architecture of the proposed STAR model. We input user u’s long-term preference \(\varvec{P}_u\), the temporal context \(\varvec{E}_t\) capturing temporal periodic patterns of the recommendation time t and the spatial context \(\varvec{H}_t\) capturing the geographical influence of user u’s current location. An POI-guided attention layer is applied to distinguish the importance of each context. User u’s instantaneous preference at time t, denoted as \(\varvec{Q}_u^t\), is computed by integrating the involved contexts with their importance into consideration.

Note that, it is tricky to model the spatial context because the real location of user is usually not available. Therefore, we utilize the user’s historical check-ins to simulate his current location and compute the spatial context as the geographical influence of historical check-ins. Figure 1(b) show the architecture of the generation of the spatial context, i.e., we use a recurrent neural network with user’s historical check-in sequence as input to achieve this purpose, and we also take the influence of the geographical distance and time decay into consideration.

Fig. 1.
figure 1

(a) Architecture of the proposed model which integrates the Spatial-Temporal context with an Attention-enhanced Recurrent neural network, i.e., STAR. STAR uses a POI-guided attention mechanism to distinguish the importance of user u’s long-term preference \(\varvec{P}_u\), the temporal context \(\varvec{E}_t\) and the spatial-temporal context \(\varvec{H}_t\). \(\alpha \) is the importance. (b) The spatial context refers to the geographical influence of user u’s historical check-ins, e.g., POI \(\{i\}\) at time \(t'\) (\(t'<t\)). \(\varvec{I}_i\) represents POI i’s influence vector. \(f(\cdot )\) and \(g(\cdot )\) are the geographical influence function and the time decay function.

4.2 Model Specification

In what follows, we specify the details of the STAR model.

Spatial-Temporal Context-Aware Recommendation. To make context-aware recommendation, we first use a POI-guided attention mechanism to distinguish the importance of the involved contexts, including user u’s long-term preference \(\varvec{P}_u\), the temporal context \(\varvec{E}_t\) and the spatial context \(\varvec{H}_t\). We introduce the generation of \(\varvec{H}_t\) in next part.

Specifically, we input not only \(\varvec{P}_u\), \(\varvec{E}_t\) and \(\varvec{H}_t\), but also the candidate POI j’s preference vector \(\varvec{S}_j\), and compute the attention weights for the three factors based on the interaction between them:

$$\begin{aligned} \begin{aligned}&\alpha _{p} = \sigma (W^T(\varvec{P}_u||\varvec{S}_j)), \\&\alpha _{e} = \sigma (W^T(\varvec{E}_t||\varvec{S}_j)), \\&\alpha _{h} = \sigma (W^T(\varvec{H}_t||\varvec{S}_j)), \end{aligned} \end{aligned}$$
(1)

where \(\alpha _{p}\), \(\alpha _{e}\) and \(\alpha _{h}\) are attention weights of \(\varvec{P}_u\), \(\varvec{E}_t\) and \(\varvec{H}_t\) respectively and W is parameter. || represents vectors’ concatenation operation and \(\sigma (z)=\frac{1}{1+e^{-z}}\) is the sigmoid function.

Then, a normalization on \(\alpha _{p}\), \(\alpha _{e}\) and \(\alpha _{h}\) is completed through a softmax function. Take \(\alpha _{p}\) as an example:

$$\begin{aligned} \alpha _{p} = \frac{exp(\alpha _{p})}{exp(\alpha _{p})+ exp(\alpha _{e})+ exp(\alpha _{h})}. \end{aligned}$$
(2)

In this way, we can automatically determine the importance of each context guided by the candidate POI j. For example, when t is the time to eat and the POI j is a restaurant, \(\alpha _{e}\) would be a large value to make the temporal context get more attention.

With the POI-specific attention weights pointing out the importance of different contexts, we compute user u’s instantaneous preference at target time t, i.e., \(\varvec{Q}_u^t\), as follows:

$$\begin{aligned} \varvec{Q}_u^t = \alpha _{p} \varvec{P}_u + \alpha _{e} \varvec{E}_t + \alpha _{h} \varvec{H}_t. \end{aligned}$$
(3)

Denote the preference as s(ujt) as the target user u’s preference to candidate POI j at target time t. We compute s(ujt) as the inner product between user u’s instantaneous preference vector \(\varvec{Q}_u^t\) and candidate POI j’s preference vector \(\varvec{S}_j\), like traditional matrix factorization does:

$$\begin{aligned} s(u, j, t) = {\varvec{Q}_u^t}^T \varvec{S}_j. \end{aligned}$$
(4)

Spatial Context Generation. Since we don’t know user u’s current location, we use the POIs in his check-in history to simulate the current location and model the spatial context \(\varvec{H}_t\) as the geographical influence from user u’s historical check-ins.

We use a recurrent neural network to naturally line up user u’s historical check-ins as a behavior sequence. In each recurrent unit, we input the check-in activity \((u, i, t')\), i.e., \(\varvec{I}_i\) as the influence vector of the visited POI i and \(\varvec{E}_{t'}\) as the preference vector of the check-in time \(t'\). \(\varvec{I}_i\) is a n-dimension vector for characterizing historical POI i’s influence on user’s future check-in preference.

We use \(h_{t'}\) to represent the hidden layer after visiting POI i at time \(t'\). \(h_{t'}\) is responsible for propagating past signals for future predictions, and it is computed as follows:

$$\begin{aligned} h_{t'} = a(W_i^T \varvec{I}_i + W_e^T \varvec{E}_{t'} + W_h^T h_{t''}), \end{aligned}$$
(5)

where \(h_{t''}\) denotes the hidden layer from the previous recurrent unit prior to time \(t'\). \(W_i\) and \(W_e\) denote the weight matrices for input vectors. \(W_h\) denotes the recurrent connections among consecutive recurrent steps. \(a(z)=\frac{1-e^{-2z}}{1+e^{-2z}}\) is the tanh function, which acts as non-linear transformation.

Instead of directly using \(h_t\) as the spatial context, we introduce another attention mechanism to build cross-dependency lying in user’s check-in sequence. The spatial context \(\varvec{H}_t\) is regarded as the aggregation of the influence of all POIs that are visited prior to t, with the influence of geographical distance between historical POIs and the candidate POI j, and the time decay after historical check-in time serving as weights. Specifically, we compute \(\varvec{H}_t\) as follows:

$$\begin{aligned} \varvec{H}_t = \sum _{(u,i,t'):t'<t} f(t-t') g(d_{ij}) h_{t'}, \end{aligned}$$
(6)

where \(f(t-t')\) is the time decay function of the elapsed time from \(t'\) to t, and \(g(d_{ij})\) is the geographical influence function, which is determined by the distance \(d_{ij}\) between historical POI i and candidate POI j. They are defined respectively by

$$\begin{aligned} f(t-t') = a * (t-t')^b, \end{aligned}$$
(7)
$$\begin{aligned} g(d_{ij}) = c * d_{ij}^d, \end{aligned}$$
(8)

where abcd are function parameters, controlling the initial scores and the steepness of time decay and geographical influence respectively.

Those two functions allow the influence of historical check-ins to decrease when the time interval becomes longer and the geographical distance becomes further, which fits intuitions and previous findings. Note that, existing methods have separately studied the time decay function [2] or the geographical influence function [20]. To our best knowledge, it is the first time to jointly integrate them for capturing the spatial-temporal attenuation effect.

4.3 Optimization

For each check-in activity (ujt), we randomly sample c negative POIs from POI set V with POI j being excluded and denote the set of negative samples as NEG(j). The objective function is defined in a ranking manner as follows:

$$\begin{aligned} L(u, j, t) = -\sum _{l \in NEG(j)} \sigma (s(u,j,t)-s(u,l,t)). \end{aligned}$$
(9)

In the optimization, we learn four latent vectors, i.e., \(\varvec{P}_u\), \(\varvec{E}_t\), \(\varvec{I}_i\) and \(\varvec{S}_j\), and four parameters in the time decay function and the geographical influence function, i.e., a, b, c and d.

We compute the score of each POI in V according to Eq. 4, and take top k POIs with highest scores as the final recommendation list.

5 Experiments

5.1 Datasets

We adopt two real-world datasets collected from Foursquare and Gowalla respectively for evaluation, which are also used in [15]. In the Foursquare dataset, there are 1,196,248 check-ins generated by 24,941 users over 28,593 POIs from April 3, 2012 to September 16, 2013, while in the Gowalla dataset there are 1,278,274 check-ins by 18,737 users over 32,510 POIs from February 4, 2009 to October 23, 2010. In both datasets, each POI is marked by its longitude and latitude, and each check-in is associated with a check-in time. For each user, we first sort her/his check-ins in chronological order, and then mark off the early 80% of her/his check-ins as training data, the next 10% as validation data, and the last 10% as testing data.

5.2 Evaluation Metrics

To evaluate the models, we adopt two widely-used metrics, i.e., hit@k [21] and mean reciprocal rank (MRR@k).

Specifically, we denote the set of check-in time in user u’s testing set as T(u). For each check-in time in T(u), we predict a recommendation list. Let \(G_{u,t}\) denote the ground truth POI that user u visited at time t, and \(P_{u,t}^k\) denotes the top k POIs recommended for user u at time t. Then, we calculate hit@k as follows:

$$\begin{aligned} hit@k=\frac{1}{|U|} \sum _{u=1}^{|U|} \frac{\sum _{t\in T(u)} \delta ( G_{u,t} \in P_{u,t}^k) }{|T(u)|}, \end{aligned}$$
(10)

where \(\delta (z)\) is an indicator function which equals 1 if and only if boolean variable z is True, and otherwise 0. \(|\cdot |\) denotes the cardinality of a set.

The MRR@k is calculated as follows:

$$\begin{aligned} MRR@k=\frac{1}{|U|} \sum _{u=1}^{|U|} \frac{\sum _{t\in T(u)} 1/rank(G_{u,t},P_{u,t}^k)}{|T(u)|}, \end{aligned}$$
(11)

where \(rank(G_{u,t},P_{u,t}^k)\) represents the rank of POI \(G_{u,t}\) in set \(P_{u,t}^k\). If \(G_{u,t}\) is not in \(P_{u,t}\), \(rank(G_{u,t},P_{ut})=\infty \).

We consider three values of k, i.e., 1, 5 and 10 in our experiments.

5.3 Baseline Methods

We first compare the proposed STAR model with its variants to demonstrate the effectiveness of the attention mechanism. Three variants are considered by respectively removing the attention layer in the generation of the spatial context (STAR-sta), the attention layer in the integration of different contexts (STAR-ca) and both the two attention layers (STAR-sta-ca).

Then, we compare the proposed STAR model with several baseline methods:

  • UG [20]: UG combines a user-based collaborative filtering method and geographical influence. It uses a power-law function to characterize the relation between check-in probability and distance.

  • UTG [22]: UTG improves the basic user-based collaborative filtering method to incorporate temporal periodic patterns, and then combines geographical influence.

  • FPMC-LR [4]: FPMC-LR is a sequential prediction model, which embeds the personalized Markov chain and adds region localization constraint for next check-ins.

  • ST-RNN [12]: ST-RNN extends RNN to model local temporal and spatial contexts with time-specific and distance-specific transition matrices.

  • GE [19]: GE is a graph-based embedding model, which utilizes sequential dependency, geographical influence and temporal periodic patterns to constrain POI’s representations.

  • Geo-Teaser [25]: Geo-Teaser is the combination of a temporal POI embedding model and a geographically hierarchical pair-wise preference ranking model.

5.4 Experimental Setting

In the experiments, we add a \(L_2\) regularization term to the parameters when performing optimization, and the regularization coefficient is set as 0.01. For all latent vectors, we set their dimension as \(n=128\). We set the negative count c as 10. The learning rate decreases from an initial value of 1.0 with the increase of iterations, and the decay factor is set as 0.5. Parameters learned for the time decay function and the geographical influence function, i.e., abcd, are \(0.16, -0.35, 0.82, -0.27\) on the Foursquare dataset and \(0.18, -0.24, 0.73, -0.32\) on the Gowalla dataset respectively.

Fig. 2.
figure 2

hit@k and MRR@k of STAR and three variants.

5.5 Performance of STAR and Its Variants

Figure 2 present the hit@k and MRR@k of the proposed STAR model and its variants respectively. We can observe that removing any attention would cause performance degradation on both datasets. This phenomenon indicates that a global weighting scheme cannot effectively organize multiple heterogeneous contexts, and the attention mechanism plays a critical role while integrating the spatial-temporal contexts.

5.6 Performance of Methods in Comparison

Figure 3 and Fig. 4 present the hit@k and MRR@k of the STAR and baseline methods respectively. It can be observed that the proposed STAR method achieves the best performance under different settings of k on both datasets and both metrics, which demonstrates the superiority of our method to these state-of-the-art methods.

Fig. 3.
figure 3

hit@k of all methods on the Foursquare and the Gowalla dataset.

Fig. 4.
figure 4

MRR@k of all methods on the Foursquare and the Gowalla dataset.

We take a detailed account of Fig. 3 as an example. Since UG considers only the geographical influence, it gets the worst performance. UTG further incorporates temporal periodic patterns (time slot in a day), resulting in a slightly better performance than UG. FPMF-LR and ST-RNN consider the sequential dependency and ignore the temporal periodic patterns, and its performance is better than that of UTG. This indicates that sequential dependence is crucial to POI recommendation. ST-RNN outperforms FPMC-LR in most cases except the case of \(k=1\) on the Foursquare dataset, this is due to the modeling of continuous distance and time intervals. GE and Geo-Teaser both consider all contexts, and thus perform better than the above methods. However, they just utilize these contexts to constrain POIs’ representation learning, and combine the involved contexts using a global weighting scheme. STAR uses a POI-guided attention mechanism to distinguish the importance of different contexts, guaranteeing its superior performance against baseline methods.

5.7 Case Study: Importance of Different Contexts

In what follows, we study how important each recommendation context (user’s long-term preference \(\varvec{P}_u\), the temporal context \(\varvec{E}_t\), the spatial context \(\varvec{H}_t\)) is over time. We directly compute the interactions between three types of factors and the true POI respectively, and compare the proportion of each part in the overall score. Specifically, we sample a user u from the Foursquare dataset for case study. u has 115 check-ins in total. We select her/his 10 consecutive check-ins from January 6, 2013 to April 6, 2013, and present the changes in proportions over time in Fig. 5.

Fig. 5.
figure 5

Impact of different contexts for predicting a user’s preference over time.

We have the following observations: (1) u visited the first 5 POIs in consecutive 5 days. As expected, the impact of the spatial context increases constantly, while the impact of user’s long-term preference becomes negligible. (2) When predicting the 9th POI, the impact of the spatial context becomes very weak, which results from the fact that this check-in is more than one month away from the 8th check-in. This check-in is mainly triggered by user’s long-term preference, since we find in the dataset that the 9th POI is the second most frequently visited POI among all check-ins of u. (3) The main reason for the last check-in is the temporal context, i.e., temporal periodic patterns (21:11). We guess that the last POI is a place for relax, as most check-ins at this POI occurred at non-working hours.

6 Conclusions

In this paper, we propose a new POI recommendation framework to dynamically integrate the spatial-temporal context. We use a low-dimensional vector to capture the temporal context and infer the spatial context by mining a user’s check-in history through an attention enhanced recurrent neural network. We integrate the spatial-temporal context with user’s long-term preference using another POI-guided attention, which can distinguish the importance of each context for recommendation. In this way, we can flexibly and accurately capture the dynamic nature of different contexts. We perform sufficient experiments on two real-world datasets and demonstrate that the proposed method consistently outperforms state-of-the-art methods for POI recommendation.