1. Introduction
Point-of-Interest (POI) recommendation is a critical service in location-based social networks (LBSNs) and has garnered significant attention in the field of recommender systems [
1,
2,
3]. Unlike traditional recommendation domains such as movies or music, capturing user preferences in POI recommendation poses unique challenges. These challenges are primarily manifested in two key aspects: (1) User’s behavior is influenced by multiple contextual factors. In POI recommendation, user’s check-in behavior is affected by both temporal and geographical factors. For instance, users typically check into dining POIs at restaurants and are unlikely to visit POIs that are too far away; (2) User historical records are highly sparse. POI recommendation relies on check-in records, which are generated only when a user physically visits a location. This leads to a significantly smaller volume of user activity data compared to other recommendation domains. The sparsity of user check-in data exacerbates the difficulty in accurately modeling user preferences and predicting future check-in behaviors.
Based on the aforementioned challenges, traditional techniques, such as collaborative filtering methods, are ineffective for POI recommendation. Recently, researchers have attempted to incorporate graph-based models into POI recommendation to enhance model performance. The key idea behind these graph-based approaches [
4,
5] is to transform user check-in records into graph-structured data, thereby leveraging the rich relational information inherent in graphs. In this way, these methods aim to explore the relationships between users and POIs within multiple contextual information, such as temporal and geographical factors. This allows for more accurate modeling of user preference characteristics and enhances the overall performance of the recommendation system.
Graph-based POI recommendation models can be broadly categorized into two main phases: graph construction and graph learning.
In the graph construction phase, the primary objective is to structure the data in a way that preserves the impact of contextual features. This involves encoding the relationships between POIs within various contextual information, such as temporal and geographical factors. The constructed graph captures the interactions among users, POIs, and their associated contexts, forming a rich relational structure that serves as the foundation for subsequent learning processes. The graph learning phase focuses on building models that can effectively learn the representations of both POIs and users. These models are typically based on graph neural networks (GNNs). The modeling process involves feeding the constructed graph structural data into a GNN-based architecture to extract meaningful representations of users and POIs.
However, models based on graph neural networks (GNNs) face inherent limitations in their message-passing mechanisms, such as over-smoothing and over-squashing issues. These limitations hinder the effective extraction of POI and user representations from graph-structured data with multiple contextual features, ultimately constraining the performance of GNN-based models in POI recommendation tasks.
To address the aforementioned challenges, we introduce a novel model named Multi-view Contextual Graphs via Convolutional Neural Networks for Point-of-Interest Recommendation (MCGRec). This framework consists of three main components: (1) Super node-based PPR Sampling. This component employs a personalized PageRank (PPR) method based on super nodes to sample highly correlated POIs from the context-feature-based graph-structured data. The sampled POIs are then transformed into a grid-like feature matrix, which serves as input for the subsequent convolutional neural network (CNN) processing; (2) Convolutional neural network-based backbone. A CNN-based model is utilized to extract POI representations from the grid-like feature matrix. Additionally, a weighted fusion method is employed to integrate multiple contextual information and calculate the final POI representations, ensuring that the model can capture the nuanced relationships between POIs and users; (3) Based on geographical and temporal factors, we develop a novel strategy to estimate user preferences. This strategy leverages the learned POI representations to construct a more comprehensive model of user behavior, integrating both geographical and temporal factors.
The main contributions of this paper are summarized as follows:
We introduce a new method (MCGRec), which employs a super node-based PPR sampling strategy to extract information from graph-structured data. This method transforms the sampled data into a grid-like feature matrix, which is then processed by a CNN-based backbone to learn POI representations.
We propose a novel user preference estimation method that comprehensively leverages the influence of geographical and temporal factors to calculate user preferences. This method integrates these contextual factors to enhance the accuracy of user preference modeling.
We conduct extensive experiments on real-world datasets to evaluate the performance of the proposed method. The experimental results demonstrate the effectiveness of MCGRec in the POI recommendation task, showing superior performance compared to representative POI recommendation approaches.
The remainder of the article is organized as follows: In
Section 2, we review related studies for POI recommendation. In
Section 3, we introduce the necessary preliminaries, including commonly used notations and definitions. In
Section 4, we detail the proposed method. In
Section 5, we describe the experimental setup and report the experimental results. Finally, in
Section 6, we summarize the entire paper.
2. Related Work
In this section, we comprehensively review recent approaches to POI recommendation from two perspectives: multi-context information-based approaches and convolution neural network-based approaches.
2.1. Multi-Context Information-Based Approaches
Multi-context information-based approaches [
6,
7,
8,
9] aim to learn POI representations and user preferences from multi-context information, such as geographical information and temporal information.
Acharya et al. [
10] incorporated Temporal Recency (TR) for visit timings and Spatial Proximity for location-based recommendations. They utilize a modified Long Short-term Memory (LSTM) model to capture temporal information and geographical proximity through orthogonal mapping to represent spatial information. Li et al. [
11] introduce a Spatio-Temporal Intention Learning Self-Attention Network (STILSAN) that includes a preference–intention module to capture long-term preferences and revisit intentions, and a spatial encoder module to learn POI spatial features through spatial clustering and proximity analysis. Cheng et al. [
12] propose a novel method, Point-of-Interest Recommendation based on Bidirectional Self-Attention Mechanism by Fusing Spatio-Temporal Preference (BSA-ST-Rec), which leverages a bidirectional self-attention mechanism to integrate spatio-temporal preferences for enhanced POI recommendations. Xie et al. [
13] developed the Spatio-Temporal context Aggregated Hierarchical Transformer (STAHT), which employs stacked hierarchical encoders to recursively encode the spatio-temporal context and identify subsequences of varying granularities, capturing the dynamic preferences from user check-ins. Wang et al. [
14] propose a spatial–temporal and text representation learning approach that utilizes the Transformer architecture to extract long-term dependencies in check-in sequences, effectively capturing both spatial and temporal dynamics. Wang et al. [
15] introduce the Global Spatio-Temporal Aware Graph Neural Network (GSTAGNN), a novel model designed to capture and leverage global spatio-temporal relationships by examining user trajectories from a comprehensive perspective, including both spatial and temporal dimensions. Wu et al. [
16] propose the Social- and Spatial–Temporal-Aware Next Point-of-Interest (SSTANPOI) method, which employs two feature encoders based on the self-attention mechanism and gated recurrent units to hierarchically model users’ check-in sequences, considering both social and spatial–temporal contexts.
Ren et al. [
17] propose a novel framework named Mining Preferences from Geographical and Interactive Correlations (MPGI) that develops a specialized layer to capture geographical distances and interactive correlations between all POI pairs. This approach aims to leverage the spatial and relational aspects of POIs to improve the accuracy of recommendations. Liu et al. [
6] developed the Spatio-Temporal Heterogeneous Information Network (STORE) model, which simultaneously extracts geographical and temporal effects along with other contextual features, such as POI types and users’ social relations. By integrating these diverse features, STORE provides a comprehensive representation of the user–POI interaction landscape. Li et al. [
18] propose a recommendation framework that transforms heterogeneous nodes, including users and POIs, into a unified representation space. This addresses the issues of noise and node similarity, allowing the framework to better differentiate similar behavior nodes, resulting in more accurate recommendations.
Dai et al. [
7] developed a unified Spatio-Temporal Neural Network (STNN) framework, which leverages users’ check-in records and social ties to recommend personalized POIs. The proposed method jointly models user–POI relations, sequential patterns, geographical influence, and social ties in a heterogeneous graph to learn user and POI representations.
Halder et al. [
19] propose a multi-task and multi-head attention transformer-based model that not only recommends the next POIs to target users but also predicts queuing times by considering user mobility behaviors. This model leverages POI descriptions to infer user personal interests, which also helps address the cold start problem for new categorical POIs. Zhou et al. [
20] developed a new method that enhances POI recommendations by incorporating user relationship strength through a data-driven approach. By analyzing users’ check-in behaviors, the proposed method defines user relationships and embeds these social links into a spatiotemporal framework for POI recommendations. Cai et al. [
21] propose Friends-aware Graph Collaborative Filtering (FG-CF), which integrates social information into a user–POI graph to improve recommendation accuracy. This method specifically addresses the challenge of leveraging social ties to enhance the quality of POI recommendations. Hu et al. [
22] propose a translation-based knowledge graph enhanced multi-task learning framework (Trans-MKR) for POI recommendation. This framework enhances the knowledge graph embedding module of the multi-task learning framework with TransR to quantify the relationships between POIs and their attributes, providing a robust solution for capturing the complex relationships in POI recommendation. Seyedhoseinzadeh et al. [
23] incorporated social, geographical, and temporal information into a matrix factorization-based method to overcome the data sparsity issue. This approach effectively leverages multiple sources of information to enhance the quality of recommendations, especially in scenarios with limited data. Chen et al. [
9] propose a novel POI recommendation model to capture and learn these complex sequential transitions by incorporating time and distance irregularities. Moreover, the proposed method introduces a strategy to dynamically weight the decay values during the model learning process.
2.2. Convolution Neural Network-Based Approaches
Convolutional neural network-based approaches aim to utilize CNN-based backbones to extract complex representations of POIs and user preferences.
Xing et al. [
24] utilized CNNs as the foundation of a unified POI recommendation framework, incorporating various types of content information to enhance the accuracy of recommendations. Safavi et al. [
25,
26] leveraged deep learning and CNNs to extract the influence of the most similar pattern of friendship rather than considering the entire user base, thereby focusing on the most relevant social connections for recommendation purposes. Xing et al. [
27] propose Review Geographical Social (ReGS), which integrates CNNs with a probability matrix factorization approach for POI recommendation, effectively combining textual and geographical information. Hao et al. [
28] utilized CNNs to extract and learn intrinsic representations from the textual information of POIs, providing a robust method for capturing the semantic content of POIs. Zhang et al. [
29] developed a CNN-based POI intrinsic embedding model that fuses valuable cross-domain knowledge to achieve more accurate POI recommendations, leveraging the strengths of CNNs to handle complex data structures.
3. Preliminaries
In this section, we review the key notations and definitions adopted in this paper. Specifically, we first introduce several key notations used in the context of location-based social networks (LBSNs). Then, we define key concepts related to the contextual graph and the POI recommendation task. The notations and their explanations are summarized in
Table 1.
3.1. Location-Based Social Network
Given a location-based social network, we define a user set with n users and a POI set with m POIs. Each POI is associated with the geographical information , which is represented as longitude and latitude. A user check-in record is represented as where denotes the check-in timestamp.
3.2. Definitions
Definition 1. [(Geographical contextual graph)]. A geographical contextual graph is defined to capture the geographical relationships between POIs. We first calculate the distance between two POIs. If the distance is less than a threshold, then these two POIs will be connected in the geographical contextual graph.
Definition 2. [(Temporal contextual graph)]. A temporal contextual graph is designed to represent the temporal relationships between POIs. Similar to the geographical contextual graph, we connect two POIs if their two most frequently checked timestamp are the same.
Definition 3. [(POI recommendation)]. Given the check-in records of a user u, the goal of the POI recommendation is to generate a POI list that the user u has not visited where k is the length of the recommendation list.
4. Methodology
In this section, we provide a detailed introduction to the proposed method, Multi-view Contextual Graphs via Convolutional Neural Networks for Point-of-Interest Recommendation (MCGRec). First, we present a PPR sampling with super nodes to extract relevant information for each POI node from the constructed contextual graphs. Next, we introduce a CNN-based neural network module to process contextual information using CNN layers and a feature fusion layer to generate comprehensive representations. Finally, we describe a method for estimating user preferences. The overall framework is shown in
Figure 1.
4.1. PPR Sampling via Super Nodes
In the constructed contextual graphs, the geographical and temporal influences are preserved within the graph-structured data. To effectively learn POI representations from these data, we initially utilize the PPR sampling method to extract pivotal node information for each POI across diverse contextual graphs, thereby capturing interrelations among POIs across various contextual features. However, direct application of the PPR algorithm tends to sample nodes from adjacent areas, potentially limiting the exploration of pertinent POI nodes. To mitigate this, we introduce a novel super node-based PPR sampling strategy.
First, we leverage different strategies to add super nodes into geographical and temporal contextual graphs. In the geographical contextual graph, POIs are categorized into groups based on their geographical attributes, typically delineated by administrative regions within the city. Each POI is then associated with a specific super node corresponding to its geographic location. Similarly, in the temporal graph, timestamps of check-ins are segmented into 24 h intervals, each represented by a dedicated super node. A POI becomes linked to the appropriate super node whenever it has been visited during the corresponding time interval. By incorporating super nodes, we can enhance the connections between POI nodes on the graph, thereby facilitating the identification of related POI nodes.
Next, we utilize the following PPR sampling strategy [
30] to conduct node sampling from the graph-structured data:
where
denotes the PPR score vectors,
denotes the propagation matrix, and
denotes the initial vector. Using Equation (1), we employ PPR scores to sample POI nodes that exhibit high correlation with each given POI. Taking the geographical contextual graph as an example, for a given POI
p, we use PPR sampling to obtain a set of highly correlated POI nodes
where
nl denotes the sampling size.
Inspired by the recent study [
31], we further transform the sampled node set into a grid-like feature matrix. This matrix serves as the input feature matrix for the CNN-based backbone:
where
denotes the raw feature matrix of POIs and
denotes the grid-like feature matrix of POI
p,
denotes the transformation function. In this paper, we rank all sampled POIs based on their PPR scores to generate the grid-like feature matrix
. Following the same operation, we can obtain the feature matrix
from the temporal contextual graph. This approach ensures that the grid-like feature matrix effectively preserves contextual information. Furthermore, leveraging such structured input data enables the application of advanced neural networks to extract comprehensive POI representations.
4.2. CNN-Based Neural Network Module
Given the grid-like feature matrix extracted from the contextual graphs, we further develop a convolution neural network-based backbone to learn representations of POIs. Taking
as an example, we design a CNN based on 1D convolutional kernels to extract POI representations. This is because, unlike the feature matrices of images that exhibit spatial correlations, each dimension of POI features exists independently. Therefore, using 2D convolutional kernels to extract POI features from a grid feature matrix is not suitable. Specifically, we develop a neural network block that contains two convolution layers. The kernel sizes of these convolutional layers are
and
, respectively. The first convolutional layer is utilized to extract POI representations from the grid feature matrix, whereas the second layer serves for nonlinear feature transformation, a standard configuration in CNN architecture. Note that the choice of the kernel size is flexible. Any reasonable combinations of convolutional neural networks are available. Through such a neural network block, we can obtain the representation of POI
p extracted from the geographical contextual graph:
where
and
are predefined CNN layers.
denotes the extracted representation of POI
p. Following the same operation, we can obtain the representation
extracted from the temporal contextual graph.
and
represent the extracted representations from different contexts. However, in POI recommendation scenarios, the influence of various contextual factors may vary. To effectively integrate information from different contexts and derive the final POI representation, we have designed the following method:
represents the aggregation weight which controls the influence of different contextual factors on the final representation and
represents the final representation of POI
p.
4.3. User Preferences Estimation
In the POI recommendation, user preferences are influenced by various contextual information. To enhance the accuracy of preference estimation, we propose a method that integrates geographical and temporal contexts. Specifically, for a user
u, the representation of user preferences is calculated as follows:
where
and
are aggregation weights to control the contributions of geographical and temporal factors.
In this paper, we employ the following strategies to calculate
and
: (1) for
, we compute the frequency of a user’s visits to each region and utilize this frequency as
. Intuitively, the higher the frequency with which a user visits a particular region, the greater the attraction of the POIs in that region to the user. (2) for
, taking into account the dynamic nature of user preferences, we use recent visit records to calculate user preferences:
where
and
are given timestamp and the check-in timestamp of POI
p. The principle of Equation (6) is based on the assumption that POIs visited in the distant past have less influence on the user preferences, while recently visited POIs have a greater impact on the user preferences.
4.4. Model Training
In this paper, to optimize the parameters of the proposed model, we employ the Bayesian Personalized Ranking (BPR) [
32], a widely adopted optimization approach in the recommendation domain. The loss function of our method is as follows:
where
denotes the nonlinear activation function and
denotes the model parameter.
is the regularization coefficient. By minimizing Equation (7), we can employ the stochastic gradient descent method to learn the model parameters.
5. Experiments
In this section, we detail the experimental setup, including datasets, baselines, evaluation metrics, and present the results obtained. Additionally, we conduct a comparative analysis of our model against representative POI recommendation approaches.
5.1. Datasets
In this paper, we utilize two datasets extracted from the real-world location-based social networks, NYC and Gowalla. Each dataset record includes the user ID, POI ID, latitude and longitude of the POI, and the check-in timestamp. Detailed dataset statistics are presented in
Table 2. We select 80% of the data as the training set, and others belong to the test set.
5.2. Baselines
In this paper, we adopt the following representative POI recommendation approaches as baselines:
GNN-POI [
33] is a comprehensive POI recommendation framework that utilizes Graph Neural Networks for learning representations from node information and topological structures.
RELINE [
34] is a unified model that jointly learns user and POI dynamics via social influence, geographical proximity, and temporal effects.
STaTRL [
14] is a transformer-based approach that leverages the self-attention mechanism to preserve different contextual information.
Neu-PCM [
35] employs deep neural networks to learn potential interactions between users and POIs.
CAPRI [
36] introduces many context-aware models to extract information from multi-contextual factors and develops a new strategy to fuse obtained contextual information.
SCR [
37] is a sequential model-based method that captures the relationships between non-consecutive POIs in complex scenarios.
BSA-ST-Rec [
12] leverages bidirectional self-attention to extract user preferences from spatio-temporal fusion embeddings.
5.3. Evaluation Metrics
To evaluate the model performance on the POI recommendation task, we adopt two mainstream evaluation metrics in the field of recommender systems:
and
:
where
and
are the generated POI list by models and the visited POI list in the test set.
is the length of the recommendation list.
5.4. Performance Comparison
To evaluate the performance of each model on the POI recommendation task, we run each model ten times on each dataset and report the average results. The results are reported in
Table 3,
Table 4,
Table 5 and
Table 6.
From the experimental results, we can observe that our proposed method outperforms the baseline methods on all datasets, demonstrating its effectiveness in the POI recommendation task. Notably, the results on the Gowalla dataset are superior to those on the NYC dataset. This can be attributed to the higher density of the Gowalla dataset, which contains more user check-in records, thus better facilitating the model’s ability to capture nuanced user preference features. Additionally, we observe that sequence-based models, such as SCR, do not achieve the best performance. This is likely due to the sparsity of user check-in records in the POI scenario, which hinders the effective training of sequence models.
5.5. Parameter Analysis
In Equation (4),
β is a crucial hyperparameter that controls the contribution of different contextual features to the final POI representation. These POI representations, in turn, influence the modeling of user preferences and ultimately impact the performance of the model in POI recommendation. To investigate the effect of
β on model performance, we vary
β in {0, 0.2, 0.4, 0.6, 0.8, 1.0} and observe the changes in model performance. The experiments are conducted on the NYC dataset with the length of the recommendation list fixed at 10. The experimental results are presented in
Figure 2 and
Figure 3.
When β is set to 0 or 1, it implies that MCGRec discards either the geographical context or the temporal context, considering only a single type of contextual feature when modeling POI representations. The experimental results show that in both cases, the model’s performance is suboptimal, indicating that modeling POI representations based on a single context is insufficiently accurate. Furthermore, the results demonstrate that as the value of β changes, the model’s performance first increases and then decreases, with the optimal β value not being 0.5. This suggests that the impact of different contexts on the final POI representation is not uniform, indicating that one context may be more influential than the other in certain scenarios when modeling POI representations.
5.6. Ablation Study
In Equation (5), we model user preferences from both geographical and temporal perspectives. To validate the effectiveness of these two modules, we propose two variant models of MCGRec: MCGRec-G and MCGRec-T. In MCGRec-G, we consider only geographical factors for modeling user preferences, while in MCGRec-T, we focus solely on temporal factors. We then evaluate the performance of these two variants on the NYC dataset. The results are presented in
Table 7 and
Table 8.
The experimental results indicate that MCGRec outperforms both variant models, MCGRec-G and MCGRec-T, demonstrating that considering the combined effects of geographical and temporal factors on user preferences can significantly enhance model performance. Additionally, we observe that MCGRec-T performs better than MCGRec-G, suggesting that temporal factors are more influential than geographical factors when modeling user preferences.
6. Conclusions
In this paper, we propose a novel POI recommendation model, MCGRec, which extracts POI representations from constructed contextual graphs. Unlike existing GNN-based methods, MCGRec leverages CNNs to extract POI features, offering greater expressive power compared to shallow GNN models. To apply the CNN model to graph-structured data, MCGRec introduces a super-node-based PPR sampling method, facilitating the sampling of POI nodes related to the target POI from the contextual association graph. Additionally, MCGRec incorporates a flexible CNN neural network module, ensuring the model’s adaptability by allowing any reasonable combination of CNN layers. Finally, MCGRec develops a user preference estimation method that models user preferences using both geographical and temporal contextual features. We conduct extensive experiments on two real-world datasets, NYC and Gowalla. On the NYC dataset, MCGRec outperforms the second-best method by 8.5%, 7.5%, and 3.8% at Precision@5, Precision@10, and Precision@15, respectively. Additionally, MCGRec surpasses the second-best method by 5.7%, 3.4%, and 1.7% at Recall@5, Recall@10, and Recall@15, respectively. On the Gowalla dataset, MCGRec outperforms the second-best method by 15.3%, 9.3%, and 14.1% at Precision@5, Precision@10, and Precision@15, respectively. Furthermore, MCGRec surpasses the second-best method by 1.7%, 2.8%, and 2.1% at Recall@5, Recall@10, and Recall@15, respectively. These empirical results demonstrate the effectiveness of MCGRec on the POI recommendation task.
A potential limitation of MCGRec lies in its naive use of the CNN architecture. Specifically, MCGRec employs only the basic modules of CNNs, overlooking more advanced and powerful architectures that could enhance the expressiveness of learned representations. Therefore, a promising future research direction is to integrate modern and powerful CNN architectures into MCGRec to further improve the model’s performance.