[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111259268A - POI recommendation model construction method and system - Google Patents

POI recommendation model construction method and system Download PDF

Info

Publication number
CN111259268A
CN111259268A CN201811454774.6A CN201811454774A CN111259268A CN 111259268 A CN111259268 A CN 111259268A CN 201811454774 A CN201811454774 A CN 201811454774A CN 111259268 A CN111259268 A CN 111259268A
Authority
CN
China
Prior art keywords
poi
check
address
determining
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811454774.6A
Other languages
Chinese (zh)
Inventor
王新珩
伊雷·内择瑞安汗择
陈涛
玛德塞·伊克巴尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chigoo Shanghai Interactive Technology Co ltd
Original Assignee
Chigoo Shanghai Interactive Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chigoo Shanghai Interactive Technology Co ltd filed Critical Chigoo Shanghai Interactive Technology Co ltd
Priority to CN201811454774.6A priority Critical patent/CN111259268A/en
Publication of CN111259268A publication Critical patent/CN111259268A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a method for constructing a POI recommendation model. The method comprises the following steps: acquiring historical check-in data sets of a plurality of users in a social network; determining a POI characteristic set of a user according to the check-in time and the check-in address information in the historical check-in data set; extracting at least one POI characteristic in the POI characteristic set, and determining a plurality of sub-model training sets; respectively training corresponding POI probability estimation on a plurality of sub-model training sets through a supervision model, iteratively updating the POI probability estimation according to a self-decision tree, and determining a plurality of POI recommendation sub-models when the POI probability estimation reaches or exceeds a preset threshold value; and aggregating the POI recommendation sub-models according to the application addition to construct a POI recommendation model. The embodiment of the invention also provides a construction system of the POI recommendation model. According to the embodiment of the invention, a plurality of POI characteristics are considered instead of all POI characteristics, and compared with the method based on a single POI characteristic or all POI characteristics, overfitting and insufficient data are avoided, so that the recommendation accuracy is higher.

Description

POI recommendation model construction method and system
Technical Field
The invention relates to the technical field of information processing, in particular to a method and a system for constructing a POI recommendation model.
Background
With the development of devices and wireless communications that can acquire a location, users use lbs n (location-based social networks) more and more times, for example, Foursquare, Gowalla, Facebook, and the like. In LBSN, users can share with each other the experiences that they have historically visited locations (i.e., locations visited by the user), also known as POIs (points-of-interest), by checking in, e.g., restaurants, shops, and museums, in the visit records that the users checked in, which shows the user's preferences for these locations. The user preference is known through the check-in data of the user, and the user preference is used for making POI suggestions of the user. This will help the user to explore new places and make the LBSN more attractive to the user.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:
existing methods generally consider independent relationships between the probability of a user accessing a POI and various characteristics of a place (heat, geographic location, time, etc.), such as independently simulating user preferences and the probability of the user accessing the POI, geographic influence and the probability of the user accessing the POI, etc., and then obtain a final result by simply fusing models of the independent relationships. However, this method is based on individual assumptions of the features, i.e., the assumption results do not match the data features that the user checked in, because this method only considers individual features. For example, locations that may be recommended are: hot pot restaurants, barbeque shops, and porridge bars, when recommended to the user at 7 am, if only the user's needs for hotness (e.g., hot pot restaurants are the most hot) are considered, then it is highly likely that the hot pot restaurants will be recommended to the user. According to the daily work and rest of people, the user is recommended with the porridge spread a which is 10KM away from the user and is not recommended with the porridge spread b which is relatively close to the user, and the user does not eat food such as chafing dish, barbecue and the like for a long time or does not consider the geographical position characteristics.
Alternatively, a conventional full-associative model method considers the relationship between all features (e.g., heat, geographic position, time, etc.) between the probability of the POI visited by the user and each feature (e.g., heat, geographic position, time, etc.) of the location, and predicts the POI of the next top-k (k POIs with the highest probability) that the user is likely to visit by using a supervised learning model to recommend the POI. However, this approach requires that the user have sufficient historical check-in data in use. When the user has less data to check-in to the LBSN, applying the complete model may result in the POI suggestion being over-fit, so that accuracy is instead reduced.
Disclosure of Invention
The method aims to solve the problems that in the prior art, a model only considering independent features does not take the relationship among the features into consideration when POI recommendation is carried out, so that recommendation is not accurate, and when a full-union model considering all the features has less check-in data, overfitting is carried out, so that POI recommendation accuracy is low. Applicants have unexpectedly discovered that users' destinations in an LBSN are driven by multiple features acting simultaneously, and that different users may be affected by different features. By combining a plurality of characteristics which can influence the user to carry out supervised learning, a personalized partial model is built by using the partial characteristics which can influence the user instead of using a complex model containing all the characteristics, thereby solving the problem.
In a first aspect, an embodiment of the present invention provides a method for constructing a POI recommendation model, including:
obtaining a historical check-in dataset of a plurality of users in a social network, wherein the historical check-in dataset at least comprises: information of the sign-in time and the sign-in address;
determining a POI characteristic set of the user according to the check-in time and the check-in address information in the historical check-in data set;
extracting at least one POI feature in the POI feature set, and determining a plurality of sub-model training sets;
respectively training corresponding POI probability estimation on the multiple sub-model training sets through a supervision model, iteratively updating the POI probability estimation according to a self-decision tree, and determining multiple POI recommendation sub-models when the POI probability estimation reaches or exceeds a preset threshold value;
and aggregating the POI recommendation sub-models according to the application addition to construct a POI recommendation model.
In a second aspect, an embodiment of the present invention provides a system for constructing a POI recommendation model, including:
a historical check-in dataset obtaining program module, configured to obtain historical check-in datasets of a plurality of users in a social network, where the historical check-in dataset at least includes: the time of the check-in, the information of the check-in address,
the POI characteristic set determining program module is used for determining the POI characteristic set of the user according to the check-in time and the check-in address information in the historical check-in data set;
the sub-model training set determining program module is used for extracting at least one POI characteristic from the POI characteristic set and determining a plurality of sub-model training sets;
the POI recommendation sub-model determining program module is used for respectively training corresponding POI probability estimation on the multiple sub-model training sets through a supervision model, iteratively updating the POI probability estimation according to a self-decision tree, and determining multiple POI recommendation sub-models when the POI probability estimation reaches or exceeds a preset threshold value;
and the POI recommendation model determining program module is used for aggregating the POI recommendation sub-models according to application addition so as to construct the POI recommendation model.
In a third aspect, an electronic device is provided, comprising: the POI recommendation model building system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the steps of the POI recommendation model building method according to any embodiment of the invention.
In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the method for constructing a POI recommendation model according to any embodiment of the present invention.
The embodiment of the invention has the beneficial effects that: determining a plurality of POI characteristics through a historical check-in data set of a user, not establishing a complex model by utilizing all characteristics, but determining a partial model according to a plurality of POI characteristics with influence, so as to learn a binary decision classifier from a subset of the characteristics, then combining a plurality of personalized partial models by applying an additional method, and finally applying the proposed model to calculate the probability of the POI (POI recommended to the user) of the highest probability top-K. Multiple POI features are considered rather than all POI features, with higher recommendation accuracy than from a single POI feature. And the problem of insufficient data or overfitting in the historical data set according to all POI characteristics is avoided, and the POI recommendation accuracy is higher. Meanwhile, more POI characteristics are set for reference, so that the accuracy of the trained POI recommendation model is higher.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for constructing a POI recommendation model according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a part of personalized model of a method for constructing a POI recommendation model according to an embodiment of the present invention;
fig. 3 is a data diagram of recommendation performance regarding a top-k value of a method for constructing a POI recommendation model according to an embodiment of the present invention;
fig. 4 is a data diagram of recommendation performance regarding top-k values of a method for constructing a POI recommendation model according to another embodiment of the present invention;
fig. 5 is a schematic structural diagram of a system for constructing a POI recommendation model according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a method for constructing a POI recommendation model according to an embodiment of the present invention, which includes the following steps:
s11: obtaining a historical check-in dataset of a plurality of users in a social network, wherein the historical check-in dataset at least comprises: information of the sign-in time and the sign-in address;
s12: determining a POI characteristic set of the user according to the check-in time and the check-in address information in the historical check-in data set;
s13: extracting at least one POI feature in the POI feature set, and determining a plurality of sub-model training sets;
s14: respectively training corresponding POI probability estimation on the multiple sub-model training sets through a supervision model, iteratively updating the POI probability estimation according to a self-decision tree, and determining multiple POI recommendation sub-models when the POI probability estimation reaches or exceeds a preset threshold value;
s15: and aggregating the POI recommendation sub-models according to the application addition to construct a POI recommendation model.
In this embodiment, for POI recommendation, check-in data of a certain number of users is needed as a basis, and users check-in the lbs n to share the experience of the visited locations with each other.
For step S11, a historical check-in dataset of a plurality of users in the social network is obtained, in this embodiment, taking historical check-in data of user a, user B, and user C as an example (for ease of understanding, the taken user a, user B, and user C are a group, and travel together for playing and check-in at the same time), the historical check-in dataset is as follows:
user 1:
sign-in number Site ID Site category Latitude Longitude (G) Time-hour Time-week
1 Shop A Shop x1 y1 13 1
2 Restaurant A Restaurant (food service) x2 y2 17 1
3 Bar A Bar x3 y3 14 2
4 Shop A Shop x1 y1 13 3
5 Restaurant A Restaurant (food service) x2 y2 17 1
6 Shop B Shop x4 y4 20 1
7 Restaurant B Restaurant (food service) x5 y5 17 4
And (4) a user 2:
sign-in number Site ID Site category Latitude Longitude (G) Time-hour Time-week
1 Shop A Shop x1 y1 13 1
2 Restaurant A Restaurant (food service) x2 y2 17 1
3 Bar A Bar x3 y3 14 2
4 Shop A Shop x1 y1 13 3
5 Restaurant A Restaurant (food service) x2 y2 17 1
6 Shop B Shop x4 y4 20 1
7 Restaurant B Restaurant (food service) x5 y5 17 4
User 3:
sign-in number Site ID Site category Latitude Longitude (G) Time-hour Time-week
1 Shop A Shop x1 y1 13 1
2 Restaurant A Restaurant (food service) x2 y2 17 1
3 Bar A Bar x3 y3 14 2
4 Shop A Shop x1 y1 13 3
5 Restaurant A Restaurant (food service) x2 y2 17 1
6 Shop B Shop x4 y4 20 1
7 Restaurant B Restaurant (food service) x5 y5 17 4
For step S12, the POI feature set of the user is determined according to the check-in time and check-in address information in the historical check-in dataset. Wherein the check-in time comprises: the information of the check-in address comprises the check-in address site ID, namely the name of the check-in address, the category of the check-in site and latitude and longitude information. The POI feature set of the user is determined through the information. For example, there are 12 POI features in the POI feature set determined according to the information;
for step S13, at least one POI feature in the POI feature set is extracted, so as to determine a plurality of sub-model training sets, for example, the POI features included in the POI feature set include: x1: POI preference, X2: type preference, X3: POI popularity, X4: geographic distance, X5: POI conversion preference, X6: type conversion preference, X7: POI conversion popularity, X8: type conversion popularity, X9: POI time-perceived popularity-hour, X10: POI time-perceived popularity-week, X11: type time-aware popularity-hour, X12: type time aware popularity-week.
When determining a plurality of training sets of submodels, the features may be combined to determine a training set for each submodel, for example: [ X1], [ X2], [ X3], …, [ X1, X2], [ X1, X3], …, [ X1, X12], [ X1, X2, X3], [ X1, X2, X4], …, [ X1, X2, X3, X4], …, [ X2, X3], [ X2, X4], …, [ X2, X3, X4, X5], …, …, [ X9, X10], ….
S14: respectively training corresponding POI probability estimation on the multiple sub-Model training sets through a supervision Model, iteratively updating the POI probability estimation according to a self-decision tree, determining multiple POI recommendation sub-models when the POI probability estimation reaches or exceeds a preset threshold value, and determining a PRM (Personalized partial Model) according to the multiple sub-Model training sets determined in the step S13, and determining a personalized partial Model (PRM) and the other sub-Model training setsA part of functional characteristics in its personalized partial model
Figure BDA0001887476990000061
All functional features. Wherein the personalized partial model is a decision tree binary classifier learned from a dataset containing partial functional features,
and when the personalized partial model of the PRM is determined, learning is carried out based on an enhancement algorithm, in the learning process, the PRM is selected iteratively in each layer m, and a prediction model with m layers of PRMs is established when m iterations are finished, so that the POI recommendation sub-model is determined.
For step S15: and aggregating the POI recommendation sub-models according to the application addition to construct a POI recommendation model. After the POI recommendation sub-models are determined, the POI recommendation sub-models can be combined by applying an addition method. And determining a POI recommendation model by combining the POI recommendation sub-models associated with the tested user.
After the POI recommendation model is determined, POI recommendations may be made to the user. And determining each possible POI to generate a characteristic vector according to the historical check-in data set of the user, and calculating according to the characteristics of the POI. For example, the characteristics of X1(POI preference) and X6 (genre conversion preference) of the detected user may be considered at the same time, and the detected user is calculated based on the preference of all users in the next activity area of the same genre area according to the preference of the detected user and the current location information of the detected user, so as to determine the corresponding score, or prediction may be performed according to other characteristics, so as to determine the corresponding score, and finally, recommendation may be performed to the user according to the POI with relatively high score.
Through the implementation method, a plurality of POI characteristics are determined in a historical check-in data set of a user, a complex model is not established by utilizing all the characteristics, but a partial model is determined according to a plurality of POI characteristics with influence, a binary decision classifier is learned from a subset of the characteristics, a plurality of personalized partial models are combined by applying an additional method, and finally the proposed model is applied to calculate the probability that the user visits the POI so as to recommend the POI (position recommended to the user) with the highest probability top-K. Multiple POI features are considered rather than all POI features, with higher recommendation accuracy than from a single POI feature. And the problem of insufficient data or overfitting in the historical data set according to all POI characteristics is avoided, and the POI recommendation accuracy is higher.
As an implementation manner, in this embodiment, the information of the check-in address includes: the sign-in address name, the sign-in address category and the sign-in address coordinate;
the POI feature set includes:
according to the check-in times of a single user in each check-in address name and check-in address category, POI (point of interest) preference and category preference of the single user are determined;
according to the names of the check-in addresses of the users and the check-in times of the check-in address categories, determining the POI popularity of each check-in address;
according to the information of the adjacent check-in addresses of the single user within the preset check-in time, POI conversion preference and category conversion preference of the single user are determined;
according to the information of the check-in addresses adjacent to the multiple users in the preset check-in time, the POI conversion popularity and the category conversion popularity of each check-in address are determined, wherein the preset check-in time further comprises the following steps: determining POI time perception popularity and type time perception popularity of each check-in address in hours and weeks;
and determining the geographic distance between the check-in addresses according to the check-in address coordinates.
In this embodiment, each POI feature is determined.
POI preference X1: the number p of user visits to the POI. This function measures the degree to which the user's check-in may occur at a place that the user visited in the past, for example, if the place that the user checked-in this time is store a, it is checked that the user visited store a several times in the past (the current number is not counted), that is, when the user visited store a for the first time, the POI preference of the user visiting store a is 0, and when the user visits store a for the second time, the POI preference of the user visiting store a is 1.
Category preference X2: in order to determine the importance of different categories of POIs (movie theaters, cafes, restaurants, etc.) of a given user, the number of check-in users executed in a specific category is considered, the calculation mode is the same as that of X1 (not counting this time, the calculation of POI features below is calculated according to this method and is not described again), for example, when the user visits store a for the first time, the preference of the user for visiting the "store" category is 0, and when the user visits store B for the second time, the preference of the user for visiting the "store" category is 1.
POI popularity X3: the total number of check-ins performed by all users in the data set, for example, user 1, user 2, and user 3 have all gone to store a once, and the POI popularity of store a is 3.
Geographic distance X4: for example, if the location where the user checked in for the first time is "shop a", the longitude and latitude thereof are (x1, y1), the location where the user checked in for the second time is "restaurant a", the longitude and latitude thereof are (x2, y2), and the distance between the two locations is d ((x2, y2), (x1, y 1)).
POI conversion preference X5: the transition of the user between POIs is not random and also contains certain information. For example, the point where the user checked in for the nth time is "shop a", the point where the user checked in for the (n + 1) th time is "restaurant a", and the POI conversion preference for "shop a" → "restaurant a" after the check-in is 1.
Type conversion preference X6: the preference of a given user in the transition between the category representing the current location and the target POI category, that is, the location where the user checked in for the nth time is "shop a", the location where the user checked in for the n +1 th time is "restaurant a", and after checking in, the POI transition preference of type "shop" → type "restaurant" is 1. Every other day, the place where the user checked in for the n + m th time is "shop B", the place where the user checked in for the n + m +1 th time is "restaurant B", and after the check-in, the POI conversion preference of the genre "shop" → the genre "restaurant" is 2.
POI conversion popularity X7: the total number of conversions that all users have completed between the current location and the target POI is similar to the POI conversion preference X5, with X5 referring to individual users and X7 referring to all users.
Type conversion popularity X8: the total number of transitions between POIs with the same current location category and POIs with the same category as the target location category for all users is similar to the genre conversion preference X6, individual users denoted by X6, and all users queried by X8.
POI time-perceived popularity (hour-X9) (week-X10): the method also takes into account temporal patterns of access to the POI as influential features, defining POI hour popularity and POI day popularity as the sum of past check-ins of the POI in a given hour h of the day and a given day d of the week. For example, the number of times all users went from restaurant A to store A, 19 o' clock in the day.
Category time perceived popularity (hour-X11) (week-X12): a time pattern is determined for accessing a particular category at different time periods and on different days of the week, e.g., the number of times all users have gone from type "restaurant" to type "store" on the monday.
According to the implementation method, the feature set extracted through the historical check-in data covers various types of features, all interest points of the user are fully considered, the extracted POI features are more in variety, more POI features are used as references, and the accuracy of the trained POI recommendation model is higher.
As an implementation manner, in this embodiment, the extracting at least one POI feature in the POI feature set and determining a plurality of sub-model training sets includes:
clustering each check-in address according to the address coordinate of each check-in address through a pre-designated cluster quantity threshold;
and determining a plurality of POI characteristics of each cluster category according to the check-in times of each check-in address in each cluster category, thereby determining a plurality of sub-model training sets.
In the present embodiment, from the historical check-in data of the user 1, the user 2, and the user 3, a feature set is determined as follows, for example,
user 1:
sign-in number X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
1 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 d((x2,y2),(x1,y1)) 0 0 0 0 0 0 0 0
3 0 0 0 d((x3,y3),(x2,y2)) 0 0 0 0 0 0 0 0
4 1 1 3 d((x1,y1),(x3,y3)) 0 0 0 0 3 0 3 0
5 1 1 3 d((x2,y2),(x1,y1)) 1 1 3 3 3 3 3 3
6 0 2 0 d((x4,y4),(x2,y2)) 0 0 0 0 0 0 0 0
7 0 2 0 d((x5,y5),(x4,y4)) 0 2 0 6 0 0 6 0
The feature set of the user 1 is determined according to the feature X1-X12 determination method in the above embodiment. When the check-in is 1, since this is the first check-in, the values of POI preference X1 and category preference X2 are 0. POI popularity X3 depends on the check-in data of all users, so the value of this feature is also 0, since no user has checked-in at store a before. Geographic distance X4 measures the distance between POIs currently checked in. Since this is the first check-in of user 1, the value of this feature is 0. The POI conversion preference X5 is also 0 since this is the first check-in by the user, so there is no previous check-in. The genre conversion preference X6, the POI conversion popularity X7, the genre conversion popularity X8, the POI time-aware popularity (hour-X9) (week-X10), and the category time-aware popularity (hour-X11) (week-X12) are all 0, and will not be described herein again.
When the check-in number is 5:
POI preference X1 has a value of 1 because restaurant a has been visited once by the same user.
The category preference X2 has a value of 1 because the same user has visited a POI from the restaurant category once.
POI popularity X3 has a value of 3 because before this check-in, except; besides user 1, user 2 and user 3 have checked in at restaurant A.
The geographical distance X4 is the distance between the check-in number 4 and the check-in number 5, i.e., d ((X2, y2), (X1, y 1)).
The POI conversion preference X5 is 1, and POIs of check-in number 4 to check-in number 5 are converted into: "store a" → "restaurant a". Before the check-in, there is a conversion of "shop a" → "restaurant a" (check-in number 1-check-in number 2).
The genre conversion preference X6 is 1 because a check-in has been previously performed from a POI having a category of "shop" to a POI having a category of "restaurant" (in a similar manner to the POI conversion preference X5 and will not be described in detail herein).
The POI conversion popularity X7 is 3 because all users have previously checked in 3 times from store a to restaurant a, (based on historical check-in information before user a, user B, user C's check-in number 5).
The genre conversion popularity X8 is 3 because check-ins have been performed three times before from all users from POIs with the category "shop" to POIs with the category "restaurant".
POI time-aware popularity (hour-X9) (week-X10), where X9 is 3 because all users have checked in 3 times from store a to restaurant a before 17 hours (points), and X10 is 3 because all users have checked in 3 times from store a to restaurant a on monday.
Category time-aware popularity (hour-X11) (week-X12), where X11 was 3 because 3 check-ins had been performed from POIs with the "shop" category to POIs with the "restaurant" category before time 17 hours (point), and X12 was 3 because 3 check-ins had been performed from POIs with the "shop" category to POIs with the "restaurant" category on monday.
The check-in numbers 2, 3, 4, 6 and 7 can be determined according to the method, and are not described herein again.
For example, the characteristics related to the check-in number 7 are X2, X6, X8 and X11, and the sub-model training set is determined according to the characteristics X2, X6, X8 and X11.
Clustering can also be performed based on the location of each check-in address to identify the geographical area that each user frequently visits, and the user's activities in the lbs n often present strong preferences in the areas they frequently visit. By learning the decision tree in each cluster, applying this method, the influential features of each cluster will be automatically identified to determine the sub-model training set, since the decision tree will select the most influential feature.
Clusters are identified by pre-specifying the number of clusters, applying k-means clustering. The number of clusters is typically determined from the number of available training data, which is equal to the number of check-ins that the user has performed. If more data per user is available, a higher number of clusters can be set. With the increase of the number of clusters and the reduction of the capacity of each cluster, the determined features have more influence, the accuracy of the trained POI recommendation model is higher, and meanwhile, the more accurate and concise features are extracted, and the training time is reduced.
As an implementation manner, in the present embodiment, the historical check-in data set includes a training data set and a verification data set, where the training data set is used to determine the POI feature set of the user.
In this embodiment, when there is a large amount of data in the historical check-in dataset, the historical check-in dataset may be divided into a training dataset and a verification dataset for training. POI features of the user are determined from the training data set. The verification data set can be verified, and the determined POI recommendation model is modified and adjusted according to the verification result.
According to the implementation method, the historical data set is divided into the training data set and the verification data set can verify and correct the trained POI recommendation model, so that the accuracy of the POI recommendation model is further improved.
As an implementation manner, in this embodiment, after the POI feature set of the user is determined according to the training data set, the method further includes:
determining a plurality of POI recommendation sub-models according to the POI feature set;
verifying the POI recommendation sub-models through a verification data set, and determining part of effective POI recommendation sub-models in the POI recommendation sub-models;
pruning the POI recommendation sub-models according to the part of the effective POI recommendation sub-models to reduce overfitting.
In this embodiment, the POI recommendation submodels are verified through the verification dataset, and then an effective POI recommendation submodel of the POI recommendation submodels is determined, which aims to output a subset of the partial model, and since the performance of the subset of the partial model is the same as that of the whole set, the possible overfitting is reduced through pruning, and the performance is improved. Parts of the model that do not affect the recommendation performance can be pruned away, for example, by a greedy approach using Reduce-Error Pruning.
According to the implementation method, the multiple POI recommendation sub-models are verified through the verification data set, the part of models which do not influence the recommendation performance is trimmed, overfitting is reduced, and therefore the accuracy of the POI recommendation models is improved.
The following is a general description of the scheme, and the problem of POI (Point of Interest) recommendation can be represented by a function f: x → Y. Where X is the input feature space and Y is the output space, in the setup of the method the input space is a set of features of interest and the output space is a set of POIs. Let D { { x (u, t), y (u, t) }, u { [ 1], N } be a data set of N users, y (u, t) be a POI visited by user u at time t, and x (u, t) be a feature value encoding the visited POI, and to calculate the output y (u, t) of the feature space x (u, t), we can only learn the function f by using the data { x (u, t '), y (u, t ') } t ' ═ 1, t-1.
The method considers a probabilistic approach in which the inputs and outputs are modeled as random variables. To do this, we learn the distribution P (Y | X) from the training set, where X ═ X1,X2,...,XFDenotes the set of F features and Y denotes POI. To obtain the distribution, the method learns a supervised model over the feature space to calculate the probability of the user visiting the POI. To train the model, visit all POIs before the predicted time t is considered a positive example. Examples of negative tokens are then retrieved by random sampling elsewhere in the city, a method that trains the model by providing feedback in the form of user preferences.
The method considers the function among the POI characteristics, so that the recommendation is more accurate. Therefore, the problem of POI recommendation is determined as follows:
in view of: a data set containing user historical check-in data, training and test cases extracted therefrom.
The following are found: a recommendation model based on training examples.
Intention is: the proposed performance is improved by evaluating the model on test cases.
The data set of the user history check-in data is acquired by checking in when users using lbs Based social network (lbs Based social network) share their own locations with each other. They share locations with each other, also known as POIs, such as restaurants, shops, museums, etc.
The method aims at constructing a recommendation model represented as H, takes a feature vector x of the POI as input, and then outputs H (x) as the probability of the user accessing the POI. The top (K) POI with the highest probability is then returned as a recommendation. When the user visits the recommended POI (including the user checking in at the recommended POI location), the model is considered to recommend the POI correctly.
It can be seen that the more accurate the feature space, the more accurate the recommended suggestions are. In other words, a more accurate model can be generated in combination with more relevant features. For example, it may be more efficient than using alone, taking into account two factors, popularity and distance, in the POI feature. However, as the feature space X increases, more data is needed to accurately estimate the distribution P (Y | X). Thus, a major challenge is to build a model that can efficiently utilize the functions available in training examples.
The method proposes a solution by splitting the model into a plurality of partial models, each model taking a subset of X as input features, as shown in the schematic diagram of a partial personalized model in fig. 2, learning a decision tree binary classifier R1 from the input features C1, which is called a partial personalized model (PRM), because it does not consider all the features, but rather a subset of all the features to construct the model. For each user, the model learns the importance of different features to distinguish between POIs that are interesting and POIs that are not attractive to the user. Then, the PRM is defined as follows:
order to
Figure BDA0001887476990000131
And comprises ClThe training set of medium eigenvalues is denoted as Dl。PRM RlIs from DlLearned decision tree binary classifier, taking into account feature subset C1,Prob(l|Cl) Using Rl +To calculate the probability of visiting a POI, and Rl -Probability Prob (-1| C) as not visiting POIl)。
The method will learn the PRMs from all possible subsets in the feature space and then combine them together to build the recommendation model. Through a partition-based approach, PRMs are learned in an efficient manner. To construct the PRMs and then utilize them together, an additive learning process based on an enhancement algorithm is incorporated. The process iteratively selects a PRM at each layer M and builds a prediction model with M layers of PRMs at the end of M iterations. For each RmStudy hypothesis hmWherein h ism(x) E IR, x is the feature vector of the example POI, IR is the real number domain. Thus, from R+ mLearn h+ mAnd at the end of M iterations, from R- mLearn h- mThe final assumption is modeled as:
Figure BDA0001887476990000141
wherein it may be classified as H itself+(x) And H-(x) Their corresponding slave h+(x) And h-(x) And (4) obtaining. When we are looking for a probability of determining access to a POI, H (x) represents H unless otherwise stated+(x) In that respect This additive learning approach mainly applies the same principle as the real-valued confidence enhancement method, which allows the use of a probabilistic estimate R from a decision treemTo update the additive model. The goal is to find the best h in each iterationmThis produces the smallest prediction error for the training instance. For this purpose, firstly, use twoThe dimension vector V-is (V1, V2) re-encodes the output of the training instance, V-is (1, -1) if x is a negative instance, and V-is (-1, 1) if x is a positive instance. The generalization of the exponential loss function for iteration m is then as follows:
Figure BDA0001887476990000142
for simplicity of representation, h (x) is used hereinafter to represent hm(x) Thus, the minimization problem is expressed as:
Figure BDA0001887476990000143
subject to h-(x)+h+(x)=0
the lagrangian of this constrained optimization problem can be written as:
exp(-h-(x))prob(-1|x)+exp(-h+(x))prob(1|x)-λ(h-(x)+h+(x))
where λ is the Lagrangian multiplier. Considering the derivatives of h and λ, we reach:
-exp(-h-(x))Prob(-1|x)-λ=0,
-exp(-h+(x))Prob(1|x)-λ=0,
h-(x)+h+(x)=0
solving this set of equations yields h (x):
Figure BDA0001887476990000151
unless otherwise indicated, h (x) denotes h+(x),h-(x) To h with+(x) The opposite way. The probabilistic estimate of the decision tree R can be used as an approximation to the conditional expectation:
Figure BDA0001887476990000152
based on the above analysis, a POI Recommendation Model BuildAPPR (Additive personalized point-of-Interest Recommendation Model) is proposed first, and a PruneAPPR function is proposed to re-evaluate the result Model to prevent over-fitting of the final hypothesis H and also to reduce the size of the PRM set. Thus, the data set of the check-in records of users u, Du is divided into two subsets, namely a growing data set (GrowSet) and a verification data set (PruneSet). Where the former dataset is used for the BulidADPR function and the latter is used for the PruneAPPR function.
For the BuildAPPR function, the weights of the training examples are first normalized at each round m to become the probability distribution
Figure BDA0001887476990000153
Wherein, Wm(i) Is the weight of instance i at iteration m. The set of PRMs is then learned from the training data, and then a PRM R is randomly selected and added to RList. Deriving R from a probability estimate calculated from the selected PRM+And R-,hm(x) The above formula is obtained. To give hmUnidentified data instances, which are given more attention in the next iteration, exponentially decrease hmCorrectly identified weight, and increasing hmThe weight of the misidentification. The weight updates for the example are as follows:
Figure BDA0001887476990000154
finally, the result is a PRM list, RList ═ R1,R2,...,RMWhich are their corresponding assumptions h1,h2,...,hM}. The final assumption can be obtained from the above formula. In order to convert h (x) into a probability distribution, it is set to h (x) exp (h (x)), and then normalized.
For the pruneAPPR function, the performance of the model is improved. The PRM set returned by the BulidAPPR was re-evaluated by calculating the error rate on pruset set to solve the possible overfitting problem. The goal is to obtain a subset of PRMs while obtaining the best performance on PruneSet. H is to be+(x) 0 as positive prediction, and h-(x) > 0 as a negative result. Then the errors are calculated as respectivelyNumber of instances that the model cannot correctly identify. This function repeatedly deletes R from RList until the minimum error value is reached or only one PRM remains in RList, and finally returns RList with the minimum error.
So far, the method designs PRM (customized Partial model) based on iteration and combines the PRM with the PRM through an additional method. However, the process of constructing PRMs is inefficient because we identify them from all possible subsets of the feature space. Therefore, part of models are learned through a partition-based method, the POI recommending performance is improved, and the training time is reduced.
User activity in LBSN often presents a strong preference bias in areas they often visit. On the other hand, similar activities are driven by similar influential features. The user only performs several types of activities (i.e., visits several categories of POIs) in each of their frequented areas. For example, an area where a person goes to shop is different from an area where he/she goes to visit a museum or gallery. By observing the geographical preferences of the user-level behavior, it is indicated that different users often have areas that they frequent. By examining the activities of the user in the areas frequently visited by the user, the activities of the user are usually limited to only a few activities in the most frequently visited areas. Check-in ratings from the same frequented area indicate strong similarity in their influential features.
To this end, the method designs a partition-based approach to identify hidden patterns of partitions obtained from geo-aggregated check-in data.
There is a pattern in each Partition from which a binary decision tree classifier is learned to represent the underlying pattern, called PPRM (Partition-based Partial Personalized Model), and therefore a process of feature selection is embedded in Partition-based PRM discovery that selects the most appropriate features to represent the underlying PPRM. Thus, the definition of PPRM is as follows:
let PlReferred to as partitions of check-in data. PRRM RlIs from PlDecision making for learningTree binary classifier of design feature subset ClThus, feature subset C is consideredlProbability Prob (l, C)l) And R1Prob (-l, C) as probability of not visiting POIl) For calculating the probability of visiting the POI.
To capture PPRMs from partitions at different levels of granularity, embeddings with different degrees of similarity are clustered by varying the number of clusters. It should be noted that the extracted partitions are not mutually exclusive, from which features can be learned more efficiently. Applying a partition-based approach to identifying PPRMs takes into account the spatial characteristics of the check-in data, which results in more accurate recommendations. Furthermore, it reduces training time compared to previous approaches that considered all possible subsets of the feature space.
When determining the user characteristics, the method defines a set of characteristics X ═ { X ═ X1,X2,…,XFTo cover different aspects of the user's motion in the lbs n; features that cover individual user preferences, such as historical visits, and features derived in view of knowledge of the overall system, such as popularity of places, geographic distance, and user transitions between places. The method also defines a set of functions that make use of the time information of the user's movements. Consider t 'and y' as the time and place of the current level, where C (y ') determines the category of POIy' and tod (t ') feeds back the hour of the day and dow (t') returns the day of the week at time t ', calculates the values of all features, knowing that the current time is t'. It is to be noted that the functions used depend on the task, and different types of functions may be defined according to the application. (the characteristics of each POI have been specifically described in the above-mentioned embodiments, and the following is a specific formula)
Point of interest POI preferences: the number p of user visits to the POI. This function measures how much the user's next check-in may occur at a place that the user has visited in the past, and the formula is:
X1(p)=|{(y,t)∈Du:t<t∧y=p}|
category preference: to determine the importance of different categories of POIs (movie theaters, cafes, restaurants, etc.) for a given user, consider the number of check-in users performed in a particular category:
X2(p)=|{(y,t)∈Du:t<t∧C(y)=C(p)}|
where C (p) is the category of POI p.
POI popularity: total number of check-ins performed by all users U in the data set in POI p:
X3(p)=|{(y,t)∈D:t<t∧y=p}|
geographic distance: considering y as the current location of user u, we measure the distance between POI p and the current location as an influential feature:
X4(p)=dist(y,p)
POI conversion preference: the user's transition between POIs is not random.
Considering Tu as the tuple set of POIs involved in the successive transformations of the user u before the current time t, the following features are defined:
X5(p)=|{(v1,v2)∈Tu:v1=y∧v2=p}|
type conversion preference: preference of a given user in transitioning between categories representing current location and target POI categories:
X6(p)=|{(v1,v2)∈Tu:C(v1)=C(y)∧C(v2)=C(p)}|
POI conversion popularity: the total number of conversions completed by all users between the current location and the target POI p:
X7(p)=|{(v1,v2)∈T:v1=y∧v2=k}|
type conversion popularity: the total number of transitions that are complete for all users between POIs with the same current location category and POIs with the same category as the target location category:
X8(p)=|{(v1,v2)∈T:C(v1)=C(y)∧C(v2)=C(p)}|
POI time-aware popularity: the method also takes into account temporal patterns of access to the POI as influential features, defining POI hour popularity and POI day popularity as the sum of past check-ins of the POI in a given hour h of the day and a given day d of the week.
X9(p)=|{(y,t)∈D:t<t∧y=p∧tod(t)=tod(t)}|
X10(p)=|{(y,t)∈D:t<t∧y=p∧tow(t)=tow(t)}|
Wherein tod (t) returns the hour of the day and dow (t) returns the day of the week.
Category-time perceived popularity: determining a temporal pattern of access to a particular category at different time periods and on different dates of the week:
X11(p)=|{(y,t)∈D:t<t∧C(y)=C(p)∧tod(t)=tod(t)}|
X12(p)=|{(y,t)∈D:t<t∧C(y)=C(p)∧tow(t)=tow(t)}|
and for a specific user u, extracting the feature set X for each accessed POI, and thus obtaining a recommendation model of the user u according to the recommendation model training step. For each POI, through extracting the feature set X, and calculating the probability of the user accessing the POI. Finally, the top k POI seat recommendation lists PList with the highest probability are returned.
To demonstrate the effectiveness of the method, an experimental setup of the method will be described herein for evaluating the performance of a proposed POI recommendation model according to POI recommendation techniques.
Here, two published real check-in datasets are used, which are captured on the LBSN. It contains the check-ins of the active users of the two major cities new york and tokyo.
The baseline recommendation techniques implemented in the experiments were divided into two categories based on the POI recommendation measures employed.
Naive bayes method (basel): the method characterizes the dependency between the probability of visiting the POI and each influencing characteristic Xi respectively, in order to realize the method, the POI is ranked according to different characteristics, and the final ranking is the product of each ranking.
Full joint model (base 2): this method characterizes the dependency between the probability of visiting a POI and all the affected features X, which applies the M5 decision tree to predict the user's next POI.
To evaluate the quality of POI recommendations, three standard indicators were used: accuracy, precision and recall. The quality of recommending the next location is first evaluated, for which the accuracy is defined:
the accuracy is 1, PList if the next POI is found among the first k POIs. The average accuracy is then calculated as the proportion of successful instances over the total number of recommended tasks.
In order to evaluate the quality of the location recommendation, it is also necessary to find the location visited by the target user in the test data set, the location identified by the recommendation method. For this purpose, accuracy and recall are defined.
Accurately define the ratio of the number of POIs found to the k recommended POIs:
Figure BDA0001887476990000191
recall a ratio defining the number of POIs found and the number of positive POIs visited by the target user in the test set to:
Figure BDA0001887476990000192
since only past check-in data can be used to predict future check-ins, each data set is divided into a training set and a test set regarding check-in time. Check-ins for the first eight months were used as the training data set and check-ins for the last two months were used as the test data set. The training set is used to learn a recommendation model to predict test data. In the experiments, the accuracy, precision and recall of the evaluation recommendation techniques were examined with respect to a top-k range from 5 to 100. The number of iterations M was set to 50 and the number of clusters g used to evaluate P-APPR was set to 4.
Finally, through experimental comparison:
naive Bayes method: this approach simply assumes that the features are independent in outcome. It models the probability of visiting POI and each feature Xi separately and then combines them by multiplication. Therefore, it cannot take advantage of the interaction between the functions in the POI suggestion. Considering that user behavior in the LBSN is affected by multiple features of synchronization, base1 returns a less accurate POI, with respect to accuracy (fig. 3-a and 4-a) and accuracy (fig. 3-b and 4-b).
In contrast to APPR, POIs visited by a target user are related to recalls. This is shown in fig. 3-c and 4-c. This approach also ignores the fact that different users may be affected by different users and considers all functions to be of the same importance.
A full joint model: to overcome the limitations of the naive bayes approach, the model describes the dependency between all features that are jointly affected by the probability of visiting the PIO and Xi. It employs supervised learning strategies to simulate POI suggestions. It therefore takes into account the interaction of features and also distinguishes the importance of different features to the user's behaviour. However, it returns the least accurate POI in terms of accuracy and precision, and misses most POIs that the target user actually visits regarding retrieval. The reason is that the user's check-in data is typically sparse in the LBSN, and the full joint model results in overfitting and low performance of the probability estimates.
Additional Personalized POI Recommendation (APPR): in contrast, APPR inherits the advantages of naive bayes methods and holo-articular models, including combining individual models, considering feature interactions, and differentiating the importance of features to different users. It therefore shows substantially better performance for various top-k values on both data sets.
Additional personalized POI recommendations based on partitioning (P-APPR): the partition-based approach significantly improves the accuracy of recommendations in both datasets compared to APPR, fig. 3-a and 3-b. However, the P-APPR is slightly lower than the APPR method in view of the proposed recall and accuracy. Since only the most suitable function, not all functions, is involved in partition-based APPR, he will generate a more accurate recommendation than APPR. On the other hand, all possible subsets of the feature space are involved in constructing the APPR. This results in the model being able to identify more POIs that the user wants to visit, which results in a higher recall rate.
From fig. 3 and 4, it is seen that base1, applying a naive bayes strategy, is inferior to the method base2, applying a full joint on both datasets-the main reason is that it handles the data sparsity problem better since base1 models each function separately. On the other hand, base2 faces overfitting due to the limited training set available, resulting in poor performance. However, the advantages of both strategies can be integrated into a unified recommendation framework through APPR.
The recommendation performance for various top-k values, as can be seen from fig. 3 to 4, recalls gradually increase with increasing k, but the accuracy of the two data sets steadily decreases, because as more POIs are returned to the user, he can identify more unknowns that the user wants to access, resulting in a higher recall rate, and on the other hand, because the recommendation technique feeds back POIs with the top k highest probabilities (scores), the user is less likely to access additional recommended POIs, because the probability of access to these POIs is lower resulting in lower accuracy. The same explanation as for the recall is that for accuracy, by returning more locations, the user is found to have an increase in the next location visited. The advantage of the APPR-P in terms of accuracy becomes more apparent as the number of POI recommended k increases. On the other hand, the recalled APPR-P is lower than the APPR. It shows that as k increases, APPR finds more positive POI, however, he also returns more false positive results. . This results in a higher recall rate and lower accuracy than the APPR-P.
In the test iteration number, which is expressed by the number of iterations required to reach some recommended performance, in the present method, k is set to 20 (the recommended number of POIs). For both datasets, the performance obtained from APPR versus P-APPR reaches its upper limit around M-50, and setting M >50 does not necessarily improve the performance of the recommended model.
Since the number of clusters has an impact on the APPR performance, the number of clusters, g, is typically determined based on the amount of available training data, which is equal to the number of check-ins for having been performed. During this experiment, two further parameters M-50 (number of iterations) and K-20 (recommended number of POIs) were set. g were evaluated using two data sets obtained-new york and tokyo. There are substantially no clusters and the entire data set is the only cluster. Thus, it is observed that better overall accuracy, recall and accuracy can be obtained using clustering. This is because clustering is used to successfully find hidden patterns for commonly used areas that the PRM can capture user access. The level of convergence depends on the amount of available training data, the more rank data that is available, the more clusters that can be extracted.
To construct an APPR, the BulidAPPR function must identify the PRM from all possible subsets of the feature space. Assuming there are F features, then in each iteration, 2 should be identifiedFPRM. Therefore, the calculation amount of this step is large. According to P-APPR, the BulidADPR function only needs to identify the PRM from the extracted cluster. Considering that there are g clusters, then only g PRMs should be identified, it should be noted that g<2FThis also improves the accuracy of the recommendations, particularly when many functions are involved.
Fig. 5 is a schematic structural diagram of a system for constructing a POI recommendation model according to an embodiment of the present invention, which can execute the method for constructing a POI recommendation model according to any of the embodiments described above and is configured in a terminal.
The word stock editing system for the voice conversation platform provided by the embodiment comprises: a historical check-in dataset acquisition program module 11, a POI feature set determination program module 12, a sub-model training set determination program module 13, a POI recommendation sub-model determination program module 14 and a POI recommendation model determination program module 15.
The historical check-in data set obtaining program module 11 is configured to obtain historical check-in data sets of a plurality of users in a social network, where the historical check-in data sets at least include: information of the sign-in time and the sign-in address; the POI feature set determination program module 12 is configured to determine a POI feature set of the user according to the check-in time and the check-in address information in the historical check-in data set; the sub-model training set determining program module 13 is configured to extract at least one POI feature in the POI feature set, and determine a plurality of sub-model training sets; the POI recommendation sub-model determining program module 14 is configured to train corresponding POI probability estimates on the multiple sub-model training sets through a supervised model, iteratively update the POI probability estimates according to a self-decision tree, and determine multiple POI recommendation sub-models when the POI probability estimates reach or exceed a preset threshold; the POI recommendation model determination program module 15 is configured to aggregate the plurality of POI recommendation sub-models according to an application addition to construct the POI recommendation model.
Further, the information of the check-in address comprises: the sign-in address name, the sign-in address category and the sign-in address coordinate;
the POI feature set includes:
according to the check-in times of a single user in each check-in address name and check-in address category, POI (point of interest) preference and category preference of the single user are determined;
according to the names of the check-in addresses of the users and the check-in times of the check-in address categories, determining the POI popularity of each check-in address;
according to the information of the adjacent check-in addresses of the single user within the preset check-in time, POI conversion preference and category conversion preference of the single user are determined;
according to the information of the check-in addresses adjacent to the multiple users in the preset check-in time, the POI conversion popularity and the category conversion popularity of each check-in address are determined, wherein the preset check-in time further comprises the following steps: determining POI time perception popularity and type time perception popularity of each check-in address in hours and weeks;
and determining the geographic distance between the check-in addresses according to the check-in address coordinates.
Further, the sub-model training set determination program module is configured to:
clustering each check-in address according to the address coordinate of each check-in address through a pre-designated cluster quantity threshold;
and determining a plurality of POI characteristics of each cluster category according to the check-in times of each check-in address in each cluster category, thereby determining a plurality of sub-model training sets.
Further, the historical check-in dataset includes a training dataset and a verification dataset, wherein the training dataset is used to determine a POI feature set of the user.
Further, after the POI feature set determination program module, the system is further configured to:
determining a plurality of POI recommendation sub-models according to the POI feature set;
verifying the POI recommendation sub-models through a verification data set, and determining part of effective POI recommendation sub-models in the POI recommendation sub-models;
pruning the POI recommendation sub-models according to the part of the effective POI recommendation sub-models to reduce overfitting.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the construction method of the POI recommendation model in any method embodiment;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
obtaining a historical check-in dataset of a plurality of users in a social network, wherein the historical check-in dataset at least comprises: information of the sign-in time and the sign-in address;
determining a POI characteristic set of the user according to the check-in time and the check-in address information in the historical check-in data set;
extracting at least one POI feature in the POI feature set, and determining a plurality of sub-model training sets;
respectively training corresponding POI probability estimation on the multiple sub-model training sets through a supervision model, iteratively updating the POI probability estimation according to a self-decision tree, and determining multiple POI recommendation sub-models when the POI probability estimation reaches or exceeds a preset threshold value;
and aggregating the POI recommendation sub-models according to the application addition to construct a POI recommendation model.
As a non-volatile computer readable storage medium, may be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules corresponding to the methods of testing software in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform a method of constructing a POI recommendation model in any of the method embodiments described above.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of a device of test software, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the means for testing software over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides an electronic device, which includes: the POI recommendation model building system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the steps of the POI recommendation model building method according to any embodiment of the invention.
The client of the embodiment of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.
(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.
(4) Other electronic devices with data processing capabilities.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A POI recommendation model construction method comprises the following steps:
obtaining a historical check-in dataset of a plurality of users in a social network, wherein the historical check-in dataset at least comprises: information of the sign-in time and the sign-in address;
determining a POI characteristic set of the user according to the check-in time and the check-in address information in the historical check-in data set;
extracting at least one POI feature in the POI feature set, and determining a plurality of sub-model training sets;
respectively training corresponding POI probability estimation on the multiple sub-model training sets through a supervision model, iteratively updating the POI probability estimation according to a self-decision tree, and determining multiple POI recommendation sub-models when the POI probability estimation reaches or exceeds a preset threshold value;
and aggregating the POI recommendation sub-models according to the application addition to construct a POI recommendation model.
2. The method of claim 1, wherein the information of the check-in address comprises: the sign-in address name, the sign-in address category and the sign-in address coordinate;
the POI feature set includes:
according to the check-in times of a single user in each check-in address name and check-in address category, POI (point of interest) preference and category preference of the single user are determined;
according to the names of the check-in addresses of the users and the check-in times of the check-in address categories, determining the POI popularity of each check-in address;
according to the information of the adjacent check-in addresses of the single user within the preset check-in time, POI conversion preference and category conversion preference of the single user are determined;
according to the information of the check-in addresses adjacent to the multiple users in the preset check-in time, the POI conversion popularity and the category conversion popularity of each check-in address are determined, wherein the preset check-in time further comprises the following steps: determining POI time perception popularity and type time perception popularity of each check-in address in hours and weeks;
and determining the geographic distance between the check-in addresses according to the check-in address coordinates.
3. The method of claim 2, wherein said extracting at least one POI feature of the set of POI features, determining a plurality of sub-model training sets comprises:
clustering each check-in address according to the address coordinate of each check-in address through a pre-designated cluster quantity threshold;
and determining a plurality of POI characteristics of each cluster category according to the check-in times of each check-in address in each cluster category, thereby determining a plurality of sub-model training sets.
4. The method of claim 1, wherein the historical check-in dataset comprises a training dataset and a validation dataset, wherein the training dataset is used to determine a set of POI features of a user.
5. The method of claim 4, wherein after the set of POI features of the user determined from the training data set, the method further comprises:
determining a plurality of POI recommendation sub-models according to the POI feature set;
verifying the POI recommendation sub-models through a verification data set, and determining part of effective POI recommendation sub-models in the POI recommendation sub-models;
pruning the POI recommendation sub-models according to the part of the effective POI recommendation sub-models to reduce overfitting.
6. A POI recommendation model building system comprises:
a historical check-in dataset obtaining program module, configured to obtain historical check-in datasets of a plurality of users in a social network, where the historical check-in dataset at least includes: information of the sign-in time and the sign-in address;
the POI characteristic set determining program module is used for determining the POI characteristic set of the user according to the check-in time and the check-in address information in the historical check-in data set;
the sub-model training set determining program module is used for extracting at least one POI characteristic from the POI characteristic set and determining a plurality of sub-model training sets;
the POI recommendation sub-model determining program module is used for respectively training corresponding POI probability estimation on the multiple sub-model training sets through a supervision model, iteratively updating the POI probability estimation according to a self-decision tree, and determining multiple POI recommendation sub-models when the POI probability estimation reaches or exceeds a preset threshold value;
and the POI recommendation model determining program module is used for aggregating the POI recommendation sub-models according to application addition so as to construct the POI recommendation model.
7. The system of claim 6, wherein the information of the check-in address comprises: the sign-in address name, the sign-in address category and the sign-in address coordinate;
the POI feature set includes:
according to the check-in times of a single user in each check-in address name and check-in address category, POI (point of interest) preference and category preference of the single user are determined;
according to the names of the check-in addresses of the users and the check-in times of the check-in address categories, determining the POI popularity of each check-in address;
according to the information of the adjacent check-in addresses of the single user within the preset check-in time, POI conversion preference and category conversion preference of the single user are determined;
according to the information of the check-in addresses adjacent to the multiple users in the preset check-in time, the POI conversion popularity and the category conversion popularity of each check-in address are determined, wherein the preset check-in time further comprises the following steps: determining POI time perception popularity and type time perception popularity of each check-in address in hours and weeks;
and determining the geographic distance between the check-in addresses according to the check-in address coordinates.
8. The system of claim 7, wherein the submodel training set determination program module is to:
clustering each check-in address according to the address coordinate of each check-in address through a pre-designated cluster quantity threshold;
and determining a plurality of POI characteristics of each cluster category according to the check-in times of each check-in address in each cluster category, thereby determining a plurality of sub-model training sets.
9. The system of claim 6, wherein the historical check-in dataset comprises a training dataset and a validation dataset, wherein the training dataset is used to determine a set of POI features of a user.
10. The system of claim 9, wherein after the POI feature set determination program module, the system is further configured to:
determining a plurality of POI recommendation sub-models according to the POI feature set;
verifying the POI recommendation sub-models through a verification data set, and determining part of effective POI recommendation sub-models in the POI recommendation sub-models;
pruning the POI recommendation sub-models according to the part of the effective POI recommendation sub-models to reduce overfitting.
CN201811454774.6A 2018-11-30 2018-11-30 POI recommendation model construction method and system Pending CN111259268A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811454774.6A CN111259268A (en) 2018-11-30 2018-11-30 POI recommendation model construction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811454774.6A CN111259268A (en) 2018-11-30 2018-11-30 POI recommendation model construction method and system

Publications (1)

Publication Number Publication Date
CN111259268A true CN111259268A (en) 2020-06-09

Family

ID=70948474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811454774.6A Pending CN111259268A (en) 2018-11-30 2018-11-30 POI recommendation model construction method and system

Country Status (1)

Country Link
CN (1) CN111259268A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115387A (en) * 2020-09-25 2020-12-22 北京百度网讯科技有限公司 Method and device for training point of interest (POI) recommendation model and electronic equipment
CN112364238A (en) * 2020-10-12 2021-02-12 山东大学 Deep learning-based user interest point recommendation method and system
CN112925926A (en) * 2021-01-28 2021-06-08 北京达佳互联信息技术有限公司 Training method and device of multimedia recommendation model, server and storage medium
CN114595309A (en) * 2022-03-04 2022-06-07 中信银行股份有限公司 Method and system for implementing a training device
CN119149834A (en) * 2024-11-20 2024-12-17 吉林大学 City POI recommendation and timestamp prediction method based on user historical check-in sequence

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011003151A (en) * 2009-06-22 2011-01-06 Kddi Corp Similarity calculation device, recommended poi determination device, poi recommendation system, similarity calculation method and program
CN105631707A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Advertisement click rate estimation method based on decision tree, application recommendation method and device
CN106778836A (en) * 2016-11-29 2017-05-31 天津大学 A kind of random forest proposed algorithm based on constraints
CN106776930A (en) * 2016-12-01 2017-05-31 合肥工业大学 A kind of location recommendation method for incorporating time and geographical location information
CN107133263A (en) * 2017-03-31 2017-09-05 百度在线网络技术(北京)有限公司 POI recommends method, device, equipment and computer-readable recording medium
CN107341261A (en) * 2017-07-13 2017-11-10 南京邮电大学 A kind of point of interest of facing position social networks recommends method
CN107657015A (en) * 2017-09-26 2018-02-02 北京邮电大学 A kind of point of interest recommends method, apparatus, electronic equipment and storage medium
CN108268934A (en) * 2018-01-10 2018-07-10 北京市商汤科技开发有限公司 Recommendation method and apparatus, electronic equipment, medium, program based on deep learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011003151A (en) * 2009-06-22 2011-01-06 Kddi Corp Similarity calculation device, recommended poi determination device, poi recommendation system, similarity calculation method and program
CN105631707A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Advertisement click rate estimation method based on decision tree, application recommendation method and device
CN106778836A (en) * 2016-11-29 2017-05-31 天津大学 A kind of random forest proposed algorithm based on constraints
CN106776930A (en) * 2016-12-01 2017-05-31 合肥工业大学 A kind of location recommendation method for incorporating time and geographical location information
CN107133263A (en) * 2017-03-31 2017-09-05 百度在线网络技术(北京)有限公司 POI recommends method, device, equipment and computer-readable recording medium
CN107341261A (en) * 2017-07-13 2017-11-10 南京邮电大学 A kind of point of interest of facing position social networks recommends method
CN107657015A (en) * 2017-09-26 2018-02-02 北京邮电大学 A kind of point of interest recommends method, apparatus, electronic equipment and storage medium
CN108268934A (en) * 2018-01-10 2018-07-10 北京市商汤科技开发有限公司 Recommendation method and apparatus, electronic equipment, medium, program based on deep learning

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115387A (en) * 2020-09-25 2020-12-22 北京百度网讯科技有限公司 Method and device for training point of interest (POI) recommendation model and electronic equipment
US20210302185A1 (en) * 2020-09-25 2021-09-30 Beijing Baidu Netcom Science Technology Co., Ltd. Training method and apparatus of poi recommendation model of interest points, and electronic device
CN112115387B (en) * 2020-09-25 2024-05-14 北京百度网讯科技有限公司 Training method and device for POI recommendation model and electronic equipment
CN112364238A (en) * 2020-10-12 2021-02-12 山东大学 Deep learning-based user interest point recommendation method and system
CN112364238B (en) * 2020-10-12 2023-04-07 山东大学 Deep learning-based user interest point recommendation method and system
CN112925926A (en) * 2021-01-28 2021-06-08 北京达佳互联信息技术有限公司 Training method and device of multimedia recommendation model, server and storage medium
CN114595309A (en) * 2022-03-04 2022-06-07 中信银行股份有限公司 Method and system for implementing a training device
CN114595309B (en) * 2022-03-04 2025-02-11 中信银行股份有限公司 A training device implementation method and system
CN119149834A (en) * 2024-11-20 2024-12-17 吉林大学 City POI recommendation and timestamp prediction method based on user historical check-in sequence
CN119149834B (en) * 2024-11-20 2025-01-14 吉林大学 Urban POI recommendation and timestamp prediction method based on user history sign-in sequence

Similar Documents

Publication Publication Date Title
CN111259268A (en) POI recommendation model construction method and system
US20210224311A1 (en) Methods and apparatus to profile geographic areas of interest
CN111681067B (en) Long-tail product recommendation method and system based on graph attention network
Grafström et al. How to select representative samples
CN107657015B (en) A point of interest recommendation method, device, electronic device and storage medium
Jiao et al. A novel next new point-of-interest recommendation system based on simulated user travel decision-making process
CN109492166B (en) Continuous interest point recommendation method based on check-in time interval mode
Song et al. Two novel DV-Hop localization algorithms for randomly deployed wireless sensor networks
KR102076407B1 (en) Method and system for recommending point of interest
CN109948066B (en) Interest point recommendation method based on heterogeneous information network
Li et al. Point-of-interest recommender systems: A separate-space perspective
Ying et al. A temporal-aware POI recommendation system using context-aware tensor decomposition and weighted HITS
Li et al. Next and next new POI recommendation via latent behavior pattern inference
CN108829761A (en) A kind of point of interest recommended method, system, medium and equipment
CN113139140B (en) Tourist attraction recommendation method based on space-time perception GRU and combined with user relationship preference
CN110874437B (en) Personalized interest point recommendation method based on multiple interest point pair ordering
CN110334293A (en) A time-aware location recommendation method based on fuzzy clustering for location-oriented social networks
US20160042282A1 (en) Relationship evaluator
Rahimi et al. Behavior-based location recommendation on location-based social networks
CN113158038A (en) Interest point recommendation method and system based on STA-TCN neural network framework
CN111104607A (en) Location recommendation method and device based on sign-in data
CN107909498B (en) Recommendation method based on area below maximized receiver operation characteristic curve
CN114467106A (en) Graph learning and automatic behavior coordination platform
CN114048391A (en) A method for recommending interest activities based on geographic grid
Xie et al. We know your preferences in new cities: Mining and modeling the behavior of travelers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200609

WD01 Invention patent application deemed withdrawn after publication