Open AccessArticle

POI Recommendation Scheme Based on User Activity Patterns and Category Similarity

Jongtae Lim

Seoheui Lee

²,

He Li

³,

Kyoungsoo Bok

^4,*

and

Jaesoo Yoo

^1,*

School of Information & Communication Engineering, Chungbuk National University, Cheongju 28644, Republic of Korea

Department of Big Data, Chungbuk National University, Cheongju 28644, Republic of Korea

School of Computer Science and Technology, Xidian University, Zunyi 563000, China

⁴

Department of Artificial Intelligence Convergence, Wonkwang University, Iksan 54538, Republic of Korea

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(23), 10997; https://doi.org/10.3390/app142310997

Submission received: 23 October 2024 / Revised: 24 November 2024 / Accepted: 25 November 2024 / Published: 26 November 2024

Download

Browse Figures

Figure 1
Relationship of the considered factors of the existing schemes and the proposed scheme. "> Figure 2
Virtual tracking pattern employing the Voronoi diagram. The Voronoi diagram marked in red represents the diagram corresponding to the time period when the user moved along the path marked in red. "> Figure 3
Overall system configuration for the proposed scheme. SVD: singular value decomposition; POI: point of interest. "> Figure 4
Clustered user groups. "> Figure 5
Extraction of top-category pattern by using highly active users. "> Figure 6
Extraction of top-category pattern for all users. "> Figure 7
Performance evaluation for high-activity groups. "> Figure 8
Performance evaluation for low-activity groups. "> Figure 9
Performance evaluation for high-activity groups. "> Figure 10
Performance evaluation for low-activity groups. ">

Versions Notes

Abstract

The utilization of location-based social networks to provide point-of-interest (POI) recommendation services has been the subject of extensive research in recent years. Various factors that can enhance the precision of POI recommendations were examined in previous studies. However, the factors of a user, including the location and time, were not considered. In this paper, we proposed a POI recommendation scheme in which user activity patterns and the similarity of categories are considered. The proposed scheme is used to organize users based on the activity level and to take into account the characteristics of both the user and location. Furthermore, it provides personalized recommendations by considering the category similarity, time, and location data that were collected from users. We evaluated the performance of the proposed scheme and compared it with that of a currently used scheme. The proposed scheme exhibits precision that is approximately 16% greater than that of the existing scheme.

Keywords:

recommendation; point of interest; user activity pattern; category similarity

1. Introduction

Location-based social networks (LBSNs) have been the subject of extensive research in recent years [1,2]. People can use LBSNs to communicate and interact with each other by sharing their current location, frequently visited locations, events, and activities. Users can share their locations and receive information regarding nearby stores, restaurants, and attractions through LBSNs. These details encompass reviews, ratings, and information regarding nearby locations. Additionally, LBSNs facilitate interactions associated with events and gatherings that take place in close proximity to users. Recommendation services, in which the LBSNs based on these characteristics are employed, offer personalized recommendations for locations or events to users, thereby increasing user engagement and promoting social network activities. This helps users explore new locations or connect with others. Customized recommendations in which each user’s unique preferences and interests are considered are becoming essential in recommendation services that employ LBSNs.

A recommendation system for points of interest (POIs) based on LBSNs recommends POIs that are appropriate for each user as per the information the user has entered. The fundamental factors to consider are time and spatial constraints and geographical characteristics. Information regarding a user’s current location and the time period they are visiting is extracted and utilized. However, existing schemes only use time and location to exclude candidate places from the recommendation list. For example, they exclude places that are not open at the current time or those that are too far away. Existing schemes do not perform analysis based on time periods and location to provide personalized recommendations.

Research has been conducted to evaluate various factors that can enhance the precision of POI recommendations. Various situational factors that influence users have been considered, including the utilization of a Voronoi diagram to create a virtual tracking pattern [1], prediction of the user’s preference in the results through the retention rate [2], and recommendations of appropriate locations for each user group by segmenting users [3]. A representative study describes a methodology in which an algorithm with high adaptability to the user group is employed by segmenting users [4]. However, the predictability and accuracy of the algorithm are significantly lower than those of the collaborative filtering methodology when assessed using actual data. Recent studies have been conducted on POI recommendations using machine learning [5,6]. These studies introduce recommendation models that achieve high accuracy using deep learning. However, there are limitations; most users do not have enough data for personalized recommendations, and machine learning consumes a significant amount of computing resources.

In this paper, we propose a POI recommendation scheme in which user activity patterns and the similarity of categories are considered. Figure 1 shows the relationships between the factors considered by the proposed scheme and those considered by existing schemes. The proposed scheme organizes users based on the activity level and considers the characteristics of both the user and the location. Furthermore, it provides more personalized recommendations by taking into account the category similarity, time, and location data that are collected from users. The proposed scheme provides more accurate personalized recommendations by considering user activity and category similarity. We evaluate the performance of the proposed scheme and compare it with that of a currently used scheme to demonstrate effectiveness.

The remainder of this paper is structured as follows. Section 2 describes the limitations and characteristics of previous studies on POI recommendations. Section 3 introduces the proposed POI recommendation scheme. Section 4 presents a comparison between the proposed scheme and an existing scheme. Section 5 summarizes the conclusion and directions for future research.

2. Related Research

2.1. POI Recommendation Methodology Based on Collaborative Filtering

Collaborative filtering is an algorithm that is widely employed in various recommendation systems. It is categorized into user-based and item-based collaborative filtering [7]. User-based collaborative filtering is used to search for users who are similar to the target user of the recommendation and make recommendations based on the item utilization history of comparable users. Table 1 shows the user information matrix for the purpose of analyzing users who exhibit similar characteristics. The user information matrix consists of rows of users and columns of items. Each cell stores a value that denotes the item used by each user or their rating. Based on these values, the similarity is computed. The similarities are combined to provide personalized recommendations.

The similarity between users is determined based on the social relationships and location information in recommendations in which LBSNs are employed. In contrast, the similarity between items is determined based on the attributes and location information of the items. LBSNs, for example, define similar users as those who frequently visit similar locations or have similar social relationships [8]. This scheme is used to calculate the degree of similarity between items to search for users who share similar characteristics. The final collaborative filtering algorithm is applied by calculating the similarity between users based on the similarity between items. However, the limitations of previous research are that the similarity is assessed by using only data location information, and social relationships and other types of attributes or user behavior patterns are not considered [8].

2.2. POI Recommendation Methodology Employing the Voronoi Diagram Tracking Pattern

The Voronoi diagram is an algorithm utilized for spatial division in location-based services. A Voronoi diagram can be employed to draw a perpendicular bisector between all points to rapidly determine the point a particular location is closest to. Voronoi diagrams have been employed to generate personalized recommendations [1]. Figure 2 shows the process of developing a virtual tracking pattern using a Voronoi diagram. A tracking pattern is established by representing a location and the user’s trajectory as a sequence of cells in the Voronoi diagram. This tracking pattern is used in POI recommendations to generate personalized recommendations. In a Voronoi diagram, each cell is created for a specific POI. By analyzing users’ movement patterns at the cell level, the scheme identifies the next POIs frequently visited by users and recommends them accordingly. However, this scheme is also restrictive because it generates recommendations in which only the user’s tracking pattern is considered without time and spatial characteristics.

2.3. POI Recommendation Methodology Based on User Activity

In the field of information search, a user’s information is crucial for conducting personalized searches or recommendations. If no data can be obtained, customized recommendations are infeasible because comprehending the preferences of the target user becomes impossible. This is referred to as the cold start problem. Schemes for resolving this problem have been explored [4]. Users are classified as either active or inactive based on the amount of check-in data they possess. As an active user can provide a sufficient amount of data, this scheme determines the maximum distance at which an active user can access a POI and uses this distance for recommendations. Conversely, as inactive users cannot provide sufficient data, the maximum distance at which they can access a POI cannot be determined. Therefore, for inactive users, the average data of all users is utilized to make recommendations regarding the access distance. However, this scheme is restricted because it fails to account for various situational factors that may affect the user. A scheme for supplementing the current time and location is required, as these are currently the only situational factors considered.

2.4. Problems of Existing Schemes

Table 2 shows situational factors that are considered in the existing scheme. Several limitations are associated with this scheme. The first problem is that the scheme fails to account for the user’s current situational factors, which include various location and time factors. For instance, in addition to the current time and location, numerous factors can be considered when recommending POIs, including the user’s distance preference as indicated in their visit history, geographical proximity, frequent patterns, preference as per time periods, and category similarity. The second is the substantial volume of data processing and the high workload. User analysis for collaborative filtering, item analysis, and user and item similarity analyses are computations that require significant computational resources. The recommendation system must be divided into two phases: offline (analysis) and online (recommendation) phases. This will ensure that the user receives the recommendation results within a reasonable response time.

3. The Proposed POI Recommendation Scheme

The proposed POI recommendation scheme takes into account user activity patterns and the similarity of categories. It is used to organize users based on the activity level and consider the characteristics of both the user and location. Furthermore, it provides personalized recommendations by taking into account the time, location, and other data that are collected from users.

3.1. System Configuration

Figure 3 shows the overall system configuration for the proposed scheme. The proposed scheme involves the collection and preprocessing phase, analysis, and recommendation phases. The collection phase involves acquiring information regarding users and visited locations from location-based websites. The preprocessing phase involves generating the necessary data from the collected information by extracting each attribute. The analysis phase involves utilizing the extracted attributes to compute the frequencies of user activities and visits. The activity index derived in this manner is used to divide the user group by applying the fuzzy c-means (FCM) clustering algorithm [9]. The database stores the obtained time-based activity patterns. The preprocessed database table is utilized in the recommendation phase. When the user’s time, latitude, and longitude data are entered, the proposed system analyzes the user’s activity level, generates a POI candidate group that is appropriate for each user, and recommends a POI that is suitable for the user from among the candidate groups.

3.2. Data Collection and Preprocessing Phase

The collection and preprocessing phase collects the data from social networks and refines the raw data for analysis. In this paper, data were collected from social networks [10]. This dataset contains check-ins in NYC and Tokyo collected for about 10 months (from 12 April 2012 to 16 February 2013). It contains 227,428 check-ins in New York City and 573,703 check-ins in Tokyo [11]. Collected data included various attributes based on users and locations. Table 3 shows the attributes and examples of the collected data. Attributes include the user ID, category ID, latitude and longitude information of the location, and check-in time.

The proposed scheme extracts the necessary attributes from raw data and uses them after preprocessing. During preprocessing following data collection, the system requires only the attributes necessary for extracting user patterns and attribute information rather than using all data. The preprocessing of location-based data carried out by the proposed scheme is as follows. First, the FCM algorithm is implemented to divide users into groups within the system based on their user IDs. The FCM algorithm employs attribute information that is used to infer the user’s activity. Rather than location-based data, such as the latitude and longitude of the location, a variety of information that can be used to infer the user’s activity is used when analyzing the user’s activity. Second, the attributes of the location are examined in order to explore the user’s activity pattern. Attributes such as the category ID and name of the location are necessary to extract the pattern. Based on the user ID, the proposed scheme analyzes the user’s activity by utilizing the user’s check-in data, the number of visits to a specific location, and the primary activity time period. Using these data, the proposed scheme analyzes the user’s activity and extracts the pattern. Furthermore, the proposed scheme takes into account the location attribute by analyzing the information on the location category.

3.3. Analysis Phase

The analysis phase analyzes the user and POI data to construct analyzed databases such as activity pattens of users and categories similarity of POIs. In this phase, whether a particular user is a highly active user is determined. Additionally, the user’s time information is entered and utilized to conduct a similarity comparison, and user analysis is performed to suggest appropriate locations for the user. The category database extracted from the pattern of locations based on time periods is used in the user analysis.

3.3.1. Activity Pattern Analysis

After the collection of actual data, the proposed scheme derives activity indexes to organize user groups. In the dataset, the user ID, number of check-ins, number of visits to specific locations, and user’s primary-activity time period are attributes that can be analyzed to determine user activity. A database is constructed for each user to preprocess the data. Subsequently, the data are used as an index to evaluate user-ID-based activity.

Table 4 shows the frequency of user check-ins. The initial activity index is generated by grouping the users’ visits into hourly intervals based on the UTC of each visit. UTC is the data logged along with the user’s visit to the location. The activity index is used to compute the frequency of visiting specific locations, excluding locations that users pass through in their daily lives. Locations that users pass through include subway stations and banks, which are not suitable for recommendation as POIs to users from other regions or travelers. A user-specific location-frequency activity index is produced after excluding these locations.

Table 5 shows the results of creating the primary-activity time period by extracting the user ID and UTC from the data attributes. By examining the UTC attributes, we can observe the user’s history of visiting a particular location across multiple time periods. Using the time density function, we detect the primary-activity time period of a specific user. In the case of the primary-activity time period, only the required time (h) is extracted and preprocessed, as it serves as an index specifying the year, month, day, hour, minute, and second. The attributes are formatted according to the user after the three activity indicators are extracted and created. Users are clustered based on the formatted activity table.

The proposed scheme assesses user activity by utilizing the preprocessed results. It clusters users based on the FCM algorithm and activity indicators. The FCM algorithm is a clustering algorithm in which feature vector data and similarity matrix data are typically employed as the input [9].

A feature vector dataset comprises the features of each data point, including the “number of visits to locations”, “frequency of visits to specific locations”, and “primary-activity time period”. The similarity matrix denotes the degree of similarity between each data point, which can be calculated using either Euclidean distance or cosine similarity [12]. In this paper, we calculate by cosine similarity. The matrix is created by calculating the similarity between the feature vectors. These vectors, representing the number of visits, frequency of visits, and primary-activity time period, are then used in the FCM algorithm to generate clustered results. Figure 4 shows the results of clustering users. The feature vector is reduced to two dimensions for visualization purposes. As a result of clustering, all users are grouped into either highly active (cluster 1) or inactive (cluster 0) categories. The grouped results are then utilized to apply the recommendation algorithm.

Table 6 shows the labeling results with respect to user activity. Cluster label = 1 is allocated to groups with high activity, whereas cluster label = 0 is assigned to groups with low activity, as determined by using the user activity index. Users are labeled and categorized into two groups. Preprocessing is conducted by incorporating the category name of the visited location, detailed latitude/longitude information, and check-in time based on the user ID. Specific patterns are extracted based on the time period, and category similarity is contrasted using the integrated data.

3.3.2. Category Similarity Analysis

To determine the prevalent category based on the time period, the proposed scheme first extracts time period information. Figure 5 shows the partial outcomes of the extraction of the top three categories with the highest frequency of active users. The top categories with the highest frequency for each user can be determined, as abundant data are available to analyze the locations that active users most frequently visit based on the time period.

Figure 6 shows the partial outcomes of the extraction of the top three categories with the highest frequency among all users. For instance, bars are locations with a high frequency of visits from 12 a.m. to 9 a.m. and from 9 p.m. to midnight, cafes from 11 a.m. to 3 p.m., and food and drink shops from 3 p.m. to 9 p.m. The proposed scheme employs this information and considers prominent categories with respect to time to suggest nearby POIs for inactive users with limited data available for inference. Additionally, the accuracy of POIs can be enhanced by combining this information with the frequently visited locations of highly active users. In this case, the detailed category similarity is used by adjusting weights.

3.4. Recommendation Phase

Recommendation phase extracts the candidate POI based on the analyzed databases and makes results based on Top-k Ranking. The POI candidate group is recommended in the recommendation phase by employing an algorithm that is appropriate for each group based on the user classification during the analysis phase. In the case of a user group with high activity, multiple clusters are formed within the group. A POI candidate group is established through collaborative filtering by utilizing similar users. In the case of a user group with low activity, the category similarity is calculated by considering the activity pattern, and the POI is recommended by taking into account the distance and check-in frequency.

In the recommendation phase, the user’s input information is utilized to generate an activity index, which is subsequently used to ascertain whether the user is a member of a group with high or low activity. If user data already exist, these are utilized; otherwise, the user is classified to be in a low-activity group. The groups are classified through labeling: label 1 denotes a group with high activity and label 0 denotes a group with low activity. After the classification of the groups, each algorithm recommends the most suitable POIs.

3.4.1. SVD CF for Highly Active User Groups

The cluster is subdivided through collaborative filtering based on the singular value decomposition (SVD) model [7]. The FCM algorithm is applied to determine whether a specific user is highly active, and the activity index of that user is calculated in the proposed scheme [13]. Model-based collaborative filtration is implemented for users with high activity levels.

The following is the procedure for recommending POIs using the SVD collaborative filtering algorithm. First, an interaction matrix is created. This matrix expresses the interaction information between users and visited locations and is used to identify the user’s preferences and interests. The columns symbolize visited locations, whereas the rows represent users. Second, matrix decomposition is performed. Each of the user–feature matrix (U), feature–feature matrix (∑), and visited-place–feature matrix (V^T) is represented by one of the three decomposed matrices. A vector that denotes the attributes of each user and a diagonal matrix that indicates the correlation between the attributes are the components of these matrices. The diagonal elements represent the significance of the features, which are listed in the order of relevance. Third, dimension reduction is performed. By carefully selecting only the most significant components of the feature–feature matrix, the proposed scheme reduces dimensionality. The values of the diagonal elements are used to determine the significant elements, and only the top k important elements are selected. This reduces dimensionality and minimizes the loss of information. After this task is completed, POIs that align with the user’s attributes are recommended through SVD collaborative filtering [14].

3.4.2. Rule Based Score Calculation for Inactive User Groups

Ascertaining the preferences and interests of users in the low-activity group solely based on their data is difficult. Thus, in the proposed scheme, general users are assumed to prefer POIs that are selected by the majority in the user’s movement pattern. The proposed scheme suggests POIs for a specific user if the user is determined to exhibit low activity based on various specific patterns that are extracted during the analysis phase. The recommendation score is calculated as per Equation (1). In Equation (1), α, β, and γ are adjustment constants. w₁, w₂, and w₃ are the weights of each attribute and can be adjusted depending on the application. Cate_i means category similarity of point i. Dist_i means distance score of point i. Chck_i means visit count of point i.

{S c o r e}_{i} = α \times w_{1} \times {C a t e}_{i} + β \times w_{2} \times {D i s t}_{i} + γ \times w_{3} \times {C h c k}_{i}

(1)

3.4.3. POI Candidate Extraction and Recommendation

The activity patterns of all users, category similarity of the visited locations, and check-in frequency are used to determine the recommendation score of the low-activity group. The proposed scheme prioritizes the category of the places that the user frequently visits during the specified time period and evaluates the activity patterns of all users once the user’s time and location information are entered. A higher weight is assigned when the recommendation score of the category is higher. Category similarity is determined using a pre-constructed ontology. Each category in the ontology is assigned a score based on the number of visits by users across different time periods. The locations are compared with the ontology’s categories, and the final category similarity score is calculated based on relationships such as an exact match, a parent inclusion, or a child inclusion of the location’s category. Furthermore, visited locations with a high frequency of check-ins are assigned a high weight during the calculation of the recommendation score. Thus, the proposed scheme recommends k POIs with the maximum recommendation score among the candidates [15].

4. Performance Evaluation

4.1. Environment for Performance Evaluation

A comparative performance evaluation was conducted to demonstrate the effectiveness of the proposed POI recommendation system over the existing POI system based on user activities and that based on collaborative filtering. Table 7 shows the performance evaluation environment. The system that was used to conduct the performance evaluation was equipped with 32.0 GB memory and an Intel (R) Core (TM) i7-9700K CPU @ 3.60 GHz processor. The proposed scheme was implemented in the Python Anaconda [16] environment for data visualization by utilizing the Python 3.8.8 language and sklearn [17], keras [18], tensorflow [19], and matplotlib [20] machine-learning libraries.

The performance of the proposed recommendation system was evaluated and compared with those of an existing activity-based recommendation system and a fundamental recommendation system with collaborative filtering. The precision, recall, and F1-score were calculated to evaluate the accuracy of the proposed scheme.

R e c a l l = \frac{R e l e v a n t E \cap R e t r i e v e d E}{R e l e v a n t E}

(2)

P r e c i s i o n = \frac{R e l e v a n t E \cap R e t r i e v e d E}{R e t r i e v e d E}

(3)

F 1 - S c o r e = 2 \times \frac{R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(4)

4.2. Performance Evaluation of the Proposed Scheme

We compared the established collaborative-filtering-based recommendation scheme with the proposed scheme. First, we divided user groups into high-activity and low-activity groups for the proposed scheme. Subsequently, we implemented an adaptive algorithm that was appropriate for each group. POIs were recommended for the high-activity category based on the user similarity and the use of SVD collaborative filtering. To recommend POIs for the low-activity group, activity patterns and category similarities had to be examined based on time periods and close distances were prioritized. Actual data were collected from LBSN sites for performance evaluation. A user database was created and employed to ascertain user activities by analyzing the number of visits, frequency of visiting specific locations, and primary activity time, among other actual data attributes, in a manner similar to the activity index. Table 8 shows the attribute information utilized.

The precision, recall, and F1-score of the proposed scheme were compared with those of the existing scheme to assess the effectiveness of the proposed scheme in terms of POI recommendation accuracy.

The dataset clustered by applying the FCM algorithm is available in Table 6. Active users are labeled as 1, whereas inactive users are labeled as 0. Table 9 shows the center point of each cluster.

4.2.1. Performance Evaluation for Active User Group

The active user group means a group of users with more check-in data compared to the inactive group, making them more suitable for analysis. A model-based collaborative filtering scheme was employed to aggregate similar users and recommend POIs [19].

We import and sample the data using the panda package. Subsequently, the data were transformed into a surprise dataset by utilizing the reader class included in the surprise group. The SVD algorithm was employed for model training; it uses the matrix factorization scheme. After the model was trained, its functionality was assessed by utilizing the test dataset. Furthermore, the test_size argument in the train test_split function was adjusted to alter the test data ratio. The test data were 20% of the total data. In general, collaborative filtering exhibits impressive performance when the dataset is dense and large, but it may display subpar performance when the dataset is sparse or small. Figure 7 shows the results of the performance evaluation for a group with a high level of activity. The results confirm that the performance is higher when the number of recommendations is lower.

4.2.2. Performance Evaluation for Inactive User Group

The performance evaluation for the low-activity group was conducted by calculating the recommendation score using the information that could be gathered from the users. Attributes necessary for calculating the recommendation score were the user ID and latitude and longitude information, which showcases the location. When recommendations were made to the low-activity group, POIs were filtered by utilizing the universal access distance, and the recommendation scores for the remaining POIs were calculated to determine the final POI. The top k numbers of POIs were selected using the user-visited place matrix, and the user ID and filtered venue ID (referred to as POI ID) were the attributes necessary to determine the k numbers of POIs. Figure 8 shows the outcomes of the performance evaluation for a group with a low level of activity. The results reveal that the performance is higher when the number of recommendations is lower, even in the case of a low-activity group.

4.3. Comparative Performance Evaluation

The proposed scheme recommends items based on the ratings of items evaluated by users; this is one of the numerous recommendation schemes in collaborative filtering. However, implicit datasets that lack user-provided ratings or scores necessitate preprocessing. Table 10 shows the methodology for allocating scores to implicit data. These schemes are employed in the proposed scheme to generate ratings for visited locations by selecting a scheme of designating ratings based on popularity. The most frequently visited places are assigned high ratings, whereas the least frequently visited places are assigned low ratings.

Table 11 shows the results of assigning ratings. The proposed scheme assigns ratings based on the visit history data contained in the dataset.

Sampling was conducted. As the dataset to be utilized contained information on approximately 60,000 users, inefficiency in terms of time and resources could be encountered if all the data were utilized for learning. Therefore, an appropriate sampling scheme was employed to conduct performance evaluation. The random sampling scheme was implemented. The data must be sampled in a manner that minimizes the risk of information loss, as outlined in Table 12.

Figure 9 shows the outcomes of the comparative performance evaluation for a group with a high level of activity. The measurement indicators are the precision, recall, and F1-score. The existing scheme employed is memory-based collaborative filtering, which yields a precision and recall of 20%. The proposed scheme increases this precision and recall value by 12%. The F1-score, which is the harmonic mean of precision and recall, improved by 16%. The 95% confidence interval for this improvement ranged from 14% to 18%, suggesting that the proposed scheme consistently outperforms the existing scheme. The reason behind these results is that memory-based collaborative filtering has limited scalability and high computational complexity for large datasets. The SVD model, a model-based collaborative filtering scheme, is employed for the high-activity group in the proposed scheme. This model decomposes the user-visited location matrix into low-dimensional latent factors through matrix decomposition and subsequently provides recommendations based on this information. Therefore, it can make more accurate predictions. Furthermore, the proposed scheme can achieve a high average performance owing to its ability to automatically acquire the characteristics of the data.

Figure 10 shows the outcomes of the comparative performance evaluation for a group with a low level of activity. Compared with the Existing scheme, the proposed scheme increases the precision, recall, and F1-score by 13%, 12.7%, and 16%, respectively. The 95% confidence interval for this improvement ranged from 14% to 18%, suggesting that the proposed scheme consistently outperforms the existing scheme. The existing scheme utilizes similarity measurements to determine the degree of similarity between users and subsequently generates recommendations. By contrast, the proposed scheme generates recommendations by distinguishing comparable groups and evaluating the similarity of the user’s activity pattern and the category similarity of the visited location. The outcomes of the performance evaluation confirm that the traditional collaborative filtering is less effective when an insufficient amount data are available and that the similarity of the activity pattern and the category similarity of the visited place are more helpful.

5. Conclusions

In this paper, we proposed a POI recommendation scheme in which user activity patterns and the similarity of categories are considered. The proposed scheme was used to organize users based on their activity level and the characteristics of both users and locations. Furthermore, it provided more personalized recommendations by taking into account the category similarity, time, and location data that were collected from users. We evaluated the performance of the proposed scheme and compared it with that of an existing one. The performance evaluation results revealed the proposed scheme’s superiority in terms of performance over the existing scheme. Unlike in the existing scheme, user activity patterns and category similarity were considered in the proposed scheme. Therefore, the proposed scheme enhanced the precision of personalized recommendation results by approximately 16%. Expected applications of the proposed scheme include personalized recommendation schemes in which social media data or applications that employ recommendation schemes are utilized.

In this paper, user patterns based on time periods were utilized. However, the accuracy of the recommendation may be influenced by the manner in which the time period is divided, as the patterns of users are highly diverse. Consequently, we intend to employ sliding windows to analyze time period patterns in future research and explore time periods for utilizing them to provide recommendations. Additionally, we intend to implement the proposed scheme in actual recommendation systems, taking user feedback into account.

Author Contributions

Conceptualization, J.L., S.L., H.L., K.B. and J.Y.; methodology, J.L., S.L., H.L., K.B. and J.Y.; software, S.L.; validation, J.L., S.L., H.L., K.B. and J.Y.; formal analysis, J.L., S.L., H.L., K.B. and J.Y.; data curation, J.L. and S.L.; writing—original draft preparation, J.L. and S.L.; writing—review and editing, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT). (No. 2022R1A2B5B02002456/33%, RS-2023-00245650/34%) and Innovative Human Resource Development for Local Intellectualization program through the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (IITP-2024-2020-0-01462, 33%).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in Foursquare Dataset at https://sites.google.com/site/yangdingqi/home/foursquare-dataset (accessed on 19 November 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, M.; Zheng, W.; Xiao, Y.; Zhu, K.; Huang, W. Exploring temporal and spatial features for following POI recommendation in LBSNs. IEEE Access 2021, 9, 35997–36007. [Google Scholar] [CrossRef]
Zhang, H.; Gan, M.; Sun, X. Incorporating memory-based preferences and point-of-interest stickiness into recommendations in location-based social networks. ISPRS Int. J. Geo-Inf. 2021, 10, 36. [Google Scholar] [CrossRef]
Liu, X.; Liu, Y.; Aberer, K.; Miao, C. Personalized point-of-interest recommendation by mining users’ preference transition. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, San Francisco, CA, USA, 27 October–1 November 2013. [Google Scholar]
Elangovan, R.; Vairavasundaram, S.; Varadarajan, V.; Ravi, L. Location-based social network recommendations with computational intelligence-based similarity computation and user check-in behavior. Concurr. Computat. Pract. Exper. 2021, 33, e6106. [Google Scholar] [CrossRef]
Qin, Y.; Wu, H.; Ju, W.; Luo, X.; Zhang, M. A diffusion model for poi recommendation. ACM Trans. Inf. Syst. 2023, 42, 1–27. [Google Scholar] [CrossRef]
Chang, W.; Sun, D.; Du, Q. Intelligent sensors for POI recommendation model using deep learning in location-based social network big data. Sensors 2023, 23, 850. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Zhang, J.; Dou, R.; Zhou, X.; Xu, X.; Wang, S.; Qi, L. Vehicle check-in data-driven POI recommendation based on improved SVD and graph convolutional network. In Proceedings of the 2022 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta), Haikou, China, 15–18 December 2022. [Google Scholar]
Shen, L.; Stopher, P.R. Review of GPS travel survey and GPS data-processing methods. Transp. Rev. 2014, 34, 316–334. [Google Scholar] [CrossRef]
Chen, Y.; Garcia, E.K.; Gupta, M.R.; Rahimi, A.; Cazzanti, L. Similarity-based classification: Concepts and algorithms. J. Mach. Learn. Res. 2009, 10, 747–776. [Google Scholar]
Narayanan, M.; Cherukuri, A.K. A study and analysis of recommendation systems for location-based social network (LBSN) with big data. IIMB Manag. Rev. 2016, 28, 25–30. [Google Scholar] [CrossRef]
Foursquare Dataset. Available online: https://sites.google.com/site/yangdingqi/home/foursquare-dataset (accessed on 19 November 2024).
Mohd, W.R.W.; Abdullah, L. Similarity measures of Pythagorean fuzzy sets based on combination of cosine similarity measure and Euclidean distance measure. AIP Conf. Proc. 2018, 1974, 020001. [Google Scholar]
Noia, T.D.; Mirizzi, R.; Ostuni, V.C.; Romito, D. Exploiting the web of data in model-based recommender systems. In Proceedings of the Sixth ACM Conference on Recommender Systems, Dublin, Ireland, 9–13 September 2012. [Google Scholar]
Yin, H.; Wang, W.; Wang, H.; Chen, L.; Zhou, X. Spatial-aware hierarchical collaborative deep learning for POI recommendation. IEEE Trans. Knowl. Data Eng. 2017, 29, 2537–2551. [Google Scholar] [CrossRef]
Li, D.; Jin, R.; Gao, J.; Liu, Z. On sampling top-k recommendation evaluation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020. [Google Scholar]
Anaconda. Available online: https://www.anaconda.com/products/distribution (accessed on 19 November 2024).
Scikit-Learn. Available online: https://scikit-learn.org/stable/ (accessed on 19 November 2024).
Keras. Available online: https://keras.io/ (accessed on 19 November 2024).
TensorFlow. Available online: https://www.tensorflow.org/ (accessed on 19 November 2024).
Matplotlib: Visualization with Python. Available online: https://matplotlib.org/ (accessed on 19 November 2024).

Figure 1. Relationship of the considered factors of the existing schemes and the proposed scheme.

Figure 2. Virtual tracking pattern employing the Voronoi diagram. The Voronoi diagram marked in red represents the diagram corresponding to the time period when the user moved along the path marked in red.

Figure 3. Overall system configuration for the proposed scheme. SVD: singular value decomposition; POI: point of interest.

Figure 4. Clustered user groups.

Figure 5. Extraction of top-category pattern by using highly active users.

Figure 6. Extraction of top-category pattern for all users.

Figure 7. Performance evaluation for high-activity groups.

Figure 8. Performance evaluation for low-activity groups.

Figure 9. Performance evaluation for high-activity groups.

Figure 10. Performance evaluation for low-activity groups.

Table 1. User information matrix.

User	Item
User	Item₁	…	Item_j	…	Item_m
User₁	r₁₁	…	r_1j	…	r_1m
…	…	…	…	…	…
User_i	r_i1	…	r_ij	…	r_im
…	…	…	…	…	…
User_n	r_n1	…	r_nj	…	r_nm

Table 2. Comparison between existing and proposed schemes.

Factors	Existing Scheme	Proposed Scheme
1. User activity	Considered	Considered
2. Time	Considered	Considered
3. Space	Considered	Considered
4. Activity pattern	Not considered	Considered
5. Category similarity	Not considered	Considered

Table 3. Example of collected attributes.

Attribute	Example 1	Example 2
User ID	470	1048
Venue ID	49bbd6c0f964a520f4531fe3	4a5a18d8f964a520b8b91fe3
Venue category ID	4bf58dd8d48988d127951735	4bf58dd8d48988d1f5941735
Venue category name	Arts and Crafts Store	Food and Drink Shop
Latitude	40.71981038	40.69369648
Longitude	−74.00258103	−73.99020256
Time period offset in minutes	−240	−240
Coordinated universal time (UTC)	Tuesday 3 April 18:00:09 + 0000 2012	Saturday 28 April 15:33:25 + 0000 2012

Table 4. Frequency of user check-ins.

User ID	Frequency of Check-Ins
354	1843
293	1769
185	1530
84	1187
315	1144
…	…
671	12
959	9

Table 5. Example of collected attributes.

User ID	Check-In Frequency	Location Frequency	Primary-Activity Time
1	101	46	15
2	116	33	0
3	111	25	1
4	127	26	2
5	226	83	12
6	82	6	9
7	82	63	0
…	…	…	…

Table 6. Labeled user ID.

Label	Labeled User ID
0	[2, 3, 4, 8, 9, 10, 14, 15, 16, 17, …]
1	[1, 7, 11, 12, 19, 30, 43, 45, 71, 73, …]

Table 7. Environment for performance evaluation.

Division	Content
Processor	Intel (R) Core (TM) i7-9700K CPU @ 3.60 GHz
Memory	32.0 GB
Operating System	Window 10 Education
Language	Python 3.8.8.
Platform	Python Anaconda custom

Table 8. Attribute information utilized.

Attribute	Description
USER_ID	Identifier for user distinction and anonymity
VENUE_ID	Identifier for location distinction
TIMESTAMP	Information extracted from the UTC of the collected information (h)

Table 9. Center point of the cluster.

Cluster	Center Point of the Cluster
0	[2.70908350 × 10² 1.437667821 × 10¹ 2.45000000 × 10² 2.85947047× 10⁰ 3.66598778 × 10²]
1	[8.15460722 × 10² 1.16242038 × 10¹ 7.26000000 × 10² 3.40127389× 10⁰ 6.93889390 × 10⁻¹⁸]

Table 10. Scheme of user scoring.

Scheme of Scoring	Explanation
Assignment of a random rating	In this scheme, each user assigns the same random rating to all locations. The process is straightforward to implement; however, it fails to accurately represent the user’s preferences.
Assignment of ratings depending on popularity	In this scheme, ratings are designated to locations based on their popularity. For example, high ratings are assigned to the most frequently visited locations, whereas low ratings are assigned to the least frequently visited locations. This scheme is generally effective because it considers the popularity of the location, even though the preferences of each user are not considered.
Assignment of ratings depending on the content	In this scheme, the attributes of each location are evaluated to assign a rating that corresponds to the user’s preferences. For example, this scheme assigns high ratings to locations that share attributes similar to those that the user has previously favored.

Table 11. Results of assigning ratings.

User ID	Venue Category	Rating
470	Arts	27
979	Bridge	47
69	Medical Center	54
395	Food Truck	1
642	Coffee Shop	5
…	…	…

Table 12. Sampling scheme.

Sampling Type	Sampling Description
Random sampling	This scheme involves arbitrarily selecting samples from a dataset and utilizing these samples for learning. This scheme is useful when the dataset is small; however, an appropriate sampling scheme must be employed for large datasets.
Stratified sampling	This scheme involves extracting samples from each stratum by dividing the dataset into strata. This approach is useful when a particular stratum plays an essential role within the dataset.
Cluster sampling	This scheme involves partitioning the dataset into multiple clusters and subsequently selecting a random subset of the clusters for sampling. This scheme can save the time and resources necessary for sampling when the dataset is large.
Systematic sampling	This scheme involves extracting samples from the dataset at regular intervals. This scheme can be helpful when the dataset is not large.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lim, J.; Lee, S.; Li, H.; Bok, K.; Yoo, J. POI Recommendation Scheme Based on User Activity Patterns and Category Similarity. Appl. Sci. 2024, 14, 10997. https://doi.org/10.3390/app142310997

AMA Style

Lim J, Lee S, Li H, Bok K, Yoo J. POI Recommendation Scheme Based on User Activity Patterns and Category Similarity. Applied Sciences. 2024; 14(23):10997. https://doi.org/10.3390/app142310997

Chicago/Turabian Style

Lim, Jongtae, Seoheui Lee, He Li, Kyoungsoo Bok, and Jaesoo Yoo. 2024. "POI Recommendation Scheme Based on User Activity Patterns and Category Similarity" Applied Sciences 14, no. 23: 10997. https://doi.org/10.3390/app142310997

APA Style

Lim, J., Lee, S., Li, H., Bok, K., & Yoo, J. (2024). POI Recommendation Scheme Based on User Activity Patterns and Category Similarity. Applied Sciences, 14(23), 10997. https://doi.org/10.3390/app142310997

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu