CN113378033B

CN113378033B - Training method and device for recommendation model

Info

Publication number: CN113378033B
Application number: CN202010157347.2A
Authority: CN
Inventors: 王晨宇
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-03-09
Filing date: 2020-03-09
Publication date: 2024-10-22
Anticipated expiration: 2040-03-09
Also published as: CN113378033A

Abstract

The invention discloses a training method and device of a recommendation model, and relates to the technical field of computers. One embodiment of the method comprises the following steps: offline calculation is carried out on the bottom layer data to obtain a target attribute value; calculating a recommendation result in real time based on an attention mechanism according to the target attribute value and a preset recall strategy, and displaying the recommendation result; calculating a recommendation result conversion rate according to the behavior of the user on the recommendation result; and according to the conversion rate of the recommendation result, performing parameter adjustment based on the group optimization strategy to train the recommendation model. According to the method and the device, the recommendation model can be trained through real-time online learning, rules of change of user preferences can be learned timely, the interpretation of the recommendation model is stronger, the coverage of recommendation results is comprehensive, the variability of different types of store recommendation strategies can be embodied, cross-store data can be recommended, and the recommendation results are more scientific.

Description

Training method and device for recommendation model

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a training method and apparatus for a recommendation model.

Background

The existing recommendation system is mostly used for training a recommendation model through an offline prediction model according to historical behavior data of a user, and then the recommendation model is used for conducting personalized recommendation on the user. However, for each store of the e-commerce platform, the personalized recommendation of the store is different from the recommendation of the home page, the user access amount is small, the data are sparse, the stores are independent, and the number of users is far greater than the number of commodities in the store, so that most of common recommendation algorithms are not applicable to the personalized recommendation of the store.

In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:

(1) The recommendation model uses offline prediction, and cannot adapt to the transition of user preference;

(2) The calculation of the scoring preference only considers the effect of a single action, so that the coverage of the recommendation result is not comprehensive enough;

(3) When stores of different types use the same recommendation model, the difference of recommendation strategies of stores of different types cannot be reflected;

(4) The recommendation model of each store only carries out recommendation in the store, but can not carry out recommendation on cross-store data;

(5) The recommendation model uses a single algorithm, and the recommendation result has certain defects and limitations.

Disclosure of Invention

Therefore, the embodiment of the invention provides a training method and a training device for a recommendation model, which can train the recommendation model through real-time online learning, learn the rule of change of user preference in time, have stronger interpretability of the recommendation model, comprehensively cover recommendation results, embody the difference of different types of store recommendation strategies, realize recommendation of cross-store data and have more scientific recommendation results.

In order to achieve the above object, according to an aspect of the embodiments of the present invention, there is provided a training method of a recommendation model.

A training method of a recommendation model, comprising: offline calculation is carried out on the bottom layer data to obtain a target attribute value; calculating a recommendation result in real time based on an attention mechanism according to the target attribute value and a preset recall strategy, and displaying the recommendation result; calculating a recommendation result conversion rate according to the behavior of the user on the recommendation result; and according to the recommendation result conversion rate, performing parameter adjustment based on a group optimization strategy to train a recommendation model.

Optionally, the target attribute value includes not less than one of: object hotness value, object behavior similarity, object attribute similarity, and user preference for objects.

Optionally, the object behavior similarity is calculated by the following method: counting object sets of the user after behavior; counting the number of users who have performed actions on two objects in the object set at the same time, the user set corresponding to each object and the preference of the users to each object; calculating a first similarity of the behavior intersection of every two objects according to the counted number of users and the user set corresponding to each object; calculating a second similarity between the objects according to the preference of the user for each object; and carrying out weighted summation on the first similarity and the second similarity to obtain the object behavior similarity.

Alternatively, the user's preference for each object is obtained by weighted averaging the user's preference for different behavior of each object.

Optionally, after calculating the second similarity between the objects, further comprising: and performing contraction and weighting processing on the second similarity.

Optionally, the recall policy includes: attenuation policies, repurchase policies, and behavior type policies.

Optionally, calculating the recommendation in real time based on the attention mechanism includes: weighting and summing the 3 factors of the attenuation factor, the buyback probability factor and the behavior type, and activating to obtain the attention weight of the object; multiplying the attention weight of the candidate object with the object behavior similarity and the object attribute similarity of the candidate object respectively, accumulating and multiplying the attention weight and the object behavior similarity and the object attribute similarity by an activation function to obtain the behavior recommendation weight and the attribute recommendation weight of the candidate object; and multiplying the hotness recommendation weight, the behavior recommendation weight, the attribute recommendation weight and the preference recommendation weight with corresponding recall data respectively, and then sequencing to obtain a recommendation result, wherein the recall data corresponding to each recommendation weight is selected from a candidate object set according to a target attribute value corresponding to the recommendation weight.

Optionally, performing parameter adjustment based on the group optimization strategy includes: constructing a weight tensor model corresponding to a store; acquiring tensor units corresponding to each store; uniformly grouping shops in each tensor unit to obtain a subgroup identifier corresponding to each shop so as to obtain a weight vector corresponding to each subgroup; counting click rate corresponding to the weight vector corresponding to each subgroup in the period; and updating the weight vector corresponding to each group until the conversion rate of the recommended result reaches the maximum.

Optionally, the weight tensor model uniformly quantizes the store into a grid model of a preset dimension according to the preset dimension; wherein each mesh model has a multidimensional weight vector.

According to another aspect of the embodiment of the invention, a training device for a recommendation model is provided.

A training device of a recommendation model, comprising: the offline processing module is used for performing offline calculation on the bottom layer data to obtain a target attribute value; the real-time recall module is used for calculating a recommendation result in real time based on an attention mechanism according to the target attribute value and a preset recall strategy and displaying the recommendation result; the result conversion module is used for calculating a recommendation result conversion rate according to the behavior of the user on the recommendation result; and the parameter adjustment module is used for carrying out parameter adjustment based on a group optimization strategy according to the recommended result conversion rate so as to train a recommended model.

According to yet another aspect of an embodiment of the present invention, an electronic device for training a recommendation model is provided.

An electronic device for training a recommendation model, comprising: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the training method of the recommendation model provided by the embodiment of the invention.

According to yet another aspect of an embodiment of the present invention, a computer-readable medium is provided.

A computer readable medium having stored thereon a computer program which when executed by a processor implements a training method for a recommendation model provided by an embodiment of the present invention.

One embodiment of the above invention has the following advantages or benefits: offline calculation is carried out on the bottom layer data to obtain a target attribute value; calculating a recommendation result in real time based on an attention mechanism according to the target attribute value and a preset recall strategy, and displaying the recommendation result; calculating a recommendation result conversion rate according to the behavior of the user on the recommendation result; according to the recommendation result conversion rate, parameter adjustment is carried out based on a group optimization strategy to train a recommendation model, offline analysis of bottom data can be achieved, recommendation results are calculated in real time based on a behavior sequence of an attention mechanism, the conversion rate of the recommendation results is used as feedback to be introduced into training of the recommendation model, the recommendation model is trained through real-time online learning, rules of change of user preference can be learned timely, and the recommendation model is high in interpretability. In addition, according to the training method of the recommendation model, the action of a plurality of behaviors is considered in the calculation of the scoring preference, so that the recommendation result is comprehensively covered; when different types of shops are recommended by using the recommendation model, the model can adaptively adjust parameters according to the characteristics of the shops, so that the difference of different types of shops recommendation strategies is reflected; because the store-crossing behavior of the user is considered in the recommending process, the store-crossing data can be recommended; the recommendation model uses various algorithms, so that the advantages and disadvantages of various recommendation algorithms are complemented, and the recommendation result is more scientific.

Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the main steps of a training method of a recommendation model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a system framework in accordance with one embodiment of the present invention;

FIG. 3 is a schematic diagram of a data flow diagram of one embodiment of the present invention;

FIG. 4 is a schematic diagram of an offline data processing flow according to one embodiment of the present invention;

FIG. 5 is a graph of pre-transformation and post-transformation similarity profiles according to one embodiment of the present invention;

FIG. 6 is a distribution diagram of the probability density of repurchase in accordance with one embodiment of the invention;

FIG. 7 is a schematic diagram of the implementation of a real-time recall algorithm according to one embodiment of the present invention;

FIG. 8 is a logic diagram of an implementation of a recall algorithm according to one embodiment of the present invention;

FIG. 9 is a weight tensor model schematic of one embodiment of the invention;

FIG. 10 is a schematic diagram of the main modules of a training device of the recommendation model according to an embodiment of the present invention;

FIG. 11 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;

Fig. 12 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Most of the existing recommendation models are training recommendation models through offline prediction models, and the recommendation models are single in use, such as GBDT (Gradient Boosting Decision Tree, gradient lifting iterative decision tree) algorithm and the like. The behavior data of the user, the attribute data of the commodity and the like are mixed and input into a model trained with fixed parameters to obtain a recommendation result, and the recommendation model cannot be studied online in real time, cannot adapt to the change of the preference of the user and is lack of interpretability; the recommendation model uses a single algorithm, and the recommendation result has certain defects and limitations.

In addition, for store recommendation, because the data are sparse due to the small data access amount of users, the number of users is far greater than the number of commodities in stores, the stores are mutually independent, and the like, different types of stores cannot embody the difference of the recommendation strategies of each store by using the same recommendation model, analysis of the data of the behaviors of the users crossing the stores is not introduced, and only the recommendation in the stores can be performed, and the like, so that the conventional recommendation system cannot be suitable for store recommendation.

In order to solve the problems in the prior art, the invention provides a training method and a training device for a recommendation model, which can realize a complete store recommendation system and comprehensively use a plurality of recommendation methods such as collaborative filtering, content recommendation, model recommendation, hot spot recommendation and the like. The training method of the recommended model of the invention is mainly characterized in that:

(1) Fusing a plurality of recommendation algorithms, wherein the advantages and disadvantages of the algorithms are complemented;

(2) The algorithm logic of collaborative filtering item-based (project-based) is improved, the calculation efficiency is improved, a distributed calculation framework of spark (a quick and general calculation engine) is used for calculation, the distribution of collaborative filtering data is transformed, and the robustness of the system is enhanced;

(3) The implicit feedback preference scoring mechanism based on various user behaviors is realized, and the robustness of scoring data of the data is enhanced;

(4) Realizing a commodity behavior sequence recall strategy model based on an attention mechanism;

(5) The preference regression model based on the user attribute solves the problem of cold start caused by sparse user behavior data;

(6) Tensor weight optimization models of stores with different types, sku numbers and user access amounts are realized;

(7) A group optimization strategy of tensor weight is used, so that models of different stores are converged to an optimal click rate target;

(8) The system model has strong interpretation, can automatically adjust parameters by a machine, and can manually adjust parameters according to experience.

Fig. 1 is a schematic diagram of main steps of a training method of a recommendation model according to an embodiment of the present invention. As shown in fig. 1, the training method of the recommendation model according to the embodiment of the present invention mainly includes the following steps S101 to S104.

Step S101: offline calculation is carried out on the bottom layer data to obtain a target attribute value;

Step S102: calculating a recommendation result in real time based on an attention mechanism according to the target attribute value and a preset recall strategy, and displaying the recommendation result;

step S103: calculating a recommendation result conversion rate according to the behavior of the user on the recommendation result;

Step S104: and according to the conversion rate of the recommendation result, performing parameter adjustment based on the group optimization strategy to train the recommendation model.

According to the above steps S101 to S104, the recommendation model is trained, the offline analysis of the underlying data can be realized, the recommendation result is calculated in real time based on the behavior sequence of the attention mechanism, and the conversion rate of the recommendation result is introduced as feedback into the training of the recommendation model, so that the rule that the user preference changes can be learned in time by learning the recommendation model online in real time, and the interpretation of the recommendation model is also stronger.

According to one embodiment of the present invention, the target attribute value includes not less than one of: object hotness value, object behavior similarity, object attribute similarity, and user preference for objects. By analyzing the plurality of target attributes to train the recommendation model, a preference scoring mechanism based on various user behaviors is realized, and the robustness of the data scoring mechanism is enhanced.

According to an embodiment of the invention, the object behavior similarity is calculated, for example, by the following method:

counting object sets of the user after behavior;

counting the number of users who have performed actions on two objects in an object set at the same time, a user set corresponding to each object and preference of the users to each object;

Calculating a first similarity of the behavior intersection of every two objects according to the counted number of users and the user set corresponding to each object;

calculating a second similarity between the objects according to the preference of the user for each object;

And carrying out weighted summation on the first similarity and the second similarity to obtain the object behavior similarity.

According to the method, the object behavior similarity is calculated, and similar objects can be obtained through the similarity of the user behaviors and the similarity of the user preferences of the objects.

According to one embodiment of the invention, the user's preference for each object is obtained by weighted averaging the user's preferences for different behaviors of each object. In the embodiment of the invention, the preference degree corresponding to different behaviors can be obtained by counting the different behaviors of the user on each object, analyzing the behaviors and correlating the behaviors with the preference degree of the user on the object. And then, different weights are given to different behaviors, so that the preference degree of the user to each object can be obtained in a weighted average mode.

According to another embodiment of the present invention, after calculating the second similarity between the objects, the second similarity may also be shrunk and weighted.

According to yet another embodiment of the present invention, a recall strategy includes: attenuation policies, repurchase policies, and behavior type policies. Because the user's preference for objects decays over time, a decay strategy is considered when calculating recommended objects; the repurchase strategy refers to counting the probability of the user repeatedly operating the object according to the time interval of the two accesses of the user to the object; behavior type policies refer to determining a user's preference for an object based on the user's behavior type for the object.

According to still another embodiment of the present invention, when calculating the recommendation result in real time based on the attention mechanism, the method may include the steps of:

Weighting and summing the 3 factors of the attenuation factor, the buyback probability factor and the behavior type, and activating to obtain the attention weight of the object;

multiplying the attention weight of the candidate object with the object behavior similarity and the object attribute similarity of the candidate object respectively for accumulation and multiplying the attention weight by an activation function to obtain the behavior recommendation weight and the attribute recommendation weight of the candidate object;

And multiplying the hotness recommendation weight, the behavior recommendation weight, the attribute recommendation weight and the preference recommendation weight with corresponding recall data respectively, and then sequencing to obtain a recommendation result, wherein the recall data corresponding to each recommendation weight is selected from a candidate object set according to a target attribute value corresponding to the recommendation weight.

In the embodiment of the invention, the recommendation result is calculated in real time based on the attention mechanism, and the preference of the object reflected by different behaviors of the user can be converted into the attention of the user to the object, so that the recommended object can be better determined.

According to the technical scheme of the invention, parameter adjustment based on the group optimization strategy can specifically comprise:

Constructing a weight tensor model corresponding to a store;

acquiring tensor units corresponding to each store;

uniformly grouping shops in each tensor unit to obtain a subgroup identifier corresponding to each shop so as to obtain a weight vector corresponding to each subgroup;

Counting click rate corresponding to the weight vector corresponding to each subgroup in the period;

And updating the weight vector corresponding to each group until the conversion rate of the recommended result reaches the maximum.

The weight tensor model uniformly quantizes the shops into a grid model with preset dimensions according to the preset dimensions; wherein each mesh model has a multidimensional weight vector.

The following describes the implementation of the technical scheme of the present invention in conjunction with specific embodiments. In an embodiment of the present invention, the object to be recommended refers to a commodity.

FIG. 2 is a schematic diagram of a system framework in accordance with one embodiment of the present invention. As shown in FIG. 2, the training process framework of the recommendation model in the embodiment of the invention mainly comprises offline computing, recall strategy, weighted sorting processing, result filtering, recommendation result displaying, conversion rate extracting, online learning group optimizing strategy and the like.

The off-line calculation part is used for calculating commodity behavior similarity, commodity attribute similarity, preference of a user to commodities and commodity heat value. The recall policy computes recall items based in part on the real-time attention mechanism of the decay policy, the repurchase policy, and the behavior type policy. The weighted sorting processing part comprises weighted sorting processing of the four parts of data calculated by the off-line calculating part, and can fuse the results of a plurality of recommendation algorithms. The result filtering part filters recall results according to commodity inventory, regions and the like. And then displaying the filtered recommendation result. In order to optimize the click rate of the recommended results and obtain a better recommended result conversion rate, the embodiment of the invention optimizes the parameters and weights of the recommended model by using a group optimization strategy through establishing a weight tensor model, including offline parameters and weights, recall parameters and weights, sorting parameters and weights, and the like.

FIG. 3 is a schematic diagram of a data flow according to one embodiment of the present invention. Fig. 3 shows the data flow of an embodiment of the present invention. First, for the bottom data in the data mart, mainly refer to user behavior data, and 4 parts of data such as item-based commodity behavior similarity, commodity attribute similarity, preference of users to commodities, commodity heat value and the like are calculated offline through Spark. Then, the 4 parts of offline data are saved into Hbase, the offline data are read from Hbase when calculating recall commodity, and the same data are repeatedly read by using Redis buffer. After the recall data is calculated, the recall data is also persisted using Hbase and cached using Redis. Finally, when data sorting, filtering and the like are further processed to obtain a recommended result, the sorted and filtered recommended result data are stored in a Redis cache to wait for the display interface of the recommended result to be called.

In the embodiment of the invention, a calculation mechanism for carrying out preference degree of the user on the commodity based on the behavior data of the user is also provided. The preference degree calculation of the user on the commodity is obtained by using comprehensive weighting of behavior scores of browsing, collecting, purchasing, adding into a shopping cart, staying time length and the like of the user. R _b represents a browsing preference, R _c represents a collecting preference, R _p represents a purchasing preference, R _a represents an additional purchasing preference, R _s represents a stay time preference, N _b、N_c、N_p、N_a、N_s represents the number of occurrences of these actions in a statistical period, and R represents a weighted average of these preferences. Then:

R_b＝tanh(N_b,T_b);R_p＝tanh(N_p,T_p);R_a＝tanh(N_a,T_a);R_s＝tanh(N_s,T_s);

R＝W_b*R_b+W_c*R_c+W_p*R_p+W_a*R_a+W_s*R_s;

Wherein, tan h (x, T) is an activation function, T is the current time, and T _b、T_c、T_p、T_a、T_s is a parameter of the corresponding activation function.

However, for the user who does not perform any user behavior on the commodity, when calculating the preference degree of the user on the commodity, the preference degree of the user on the commodity is predicted by a regression model based on the user attribute, specifically, GBDT (Gradient Boosting Decision Tree, gradient lifting iteration decision tree) algorithm can be adopted to establish a prediction model of the user attribute and the preference degree of the user on the commodity, and because the data in the data store of the single commodity is sparse, the data set is lifted to the data set which takes the specific class of the commodity as the object for regression training, and then the single commodity in the store is weighted by using the heat degree.

According to an embodiment of the present invention, the regression table based on the user attribute is obtained, for example, as table 1 below.

TABLE 1

Sex (sex)	Age of	Region (zone)	Preference degree
				Man's body	30	Sichuan (Sichuan)	0.9
Female	20	Beijing	0.8
				Man's body	50	Shandong province	0.1
Female	18	Henan province	0.6

The calculation formula of the preference degree of the user for the single commodity is as follows:

P_us＝H_i*P_uc；

where H _i is the heat value of the commodity, P _uc is the preference of the user for the commodity, and is the output of the regression model based on the user attribute.

The recommendation based on the commodity content depends on the similarity of attributes among commodities, and in general, commodities of the same class are commodities with higher attribute similarity. Therefore, in the embodiment of the invention, the heat value of the commodity under the same class is taken as the preference of the user to the commodity. The heat value of the commodity under the same class is calculated as follows:

B_t＝∑A_t*P_t；

Where P _t represents the user's preference for the commodity, A _t represents the user's attenuation value for the commodity's preference at time t, B _i represents the accumulated heat at the commodity, and H _i represents the normalized heat value.

Further, the preference of the user for all kinds of commodities (denoted as S) is a heat value of the commodity obtained by weighted accumulation of the decay factors and the preference at different times:

B_t＝∑A_t*P_t；

The heat value obtained is normalized as follows:

FIG. 4 is a schematic diagram of an offline data processing flow according to one embodiment of the invention. As shown in FIG. 4, in the embodiment of the invention, the distributed computation based on spark needs to partition the data, so that the behavior data of all shops in the partition are stored in the partition, the data of each partition is aggregated by data grouping in the shops through groupbykey operation, and then item-based computation of the data in the shops is realized.

The item_based algorithm firstly counts the behaviors of the user because of larger double-cycle calculation amount, and then realizes double-cycle calculation similarity in the behavior set of the user, and the algorithm comprises the following steps:

1) Counting commodity sets with excessive behaviors of users, filtering out users with excessive commodity behaviors, and preventing data generated by web crawlers;

2) And (3) counting the number C of the two commodities purchased by the same person in the commodity set of the user in a circulating way, and counting the user set of the commodities and the scores of the commodities from the user. After counting the commodity sets of all users, calculating the first similarity of the commodity behavior intersection of every two pairs through the Jaccard coefficient jcd:

3) Calculating a second similarity between commodities according to the user set corresponding to each commodity and the preference scores of the users on the commodities:

Wherein R _u,i is the user's score for item i, Is the average of the preference scores of user u for all items, and S (i, j) is the similarity of item i to item j.

4) The first similarity and the second similarity are weighted and summed to obtain the commodity behavior similarity SIM (k, j), and the commodity k (which may be one or a plurality of commodities) similar to the commodity j is determined.

Since the numbers of commodities sku of different shops are different, the similarity calculation between commodities of shops with small numbers of commodities is relatively large, the similarity value between commodities with large numbers of shops sku is relatively small, and long-tail distribution of similarity data can occur, so that the calculated second similarity needs to be contracted and weighted, and the second similarity obtained after the processing is:

S(i,j)＝1-tanh(-ln(x*N_sku),T)；

where N _sku represents the number of stores sku and T is the parameter of the activation function tanh.

FIG. 5 is a graph of a pre-transformation and post-transformation similarity profile according to one embodiment of the invention. As shown in fig. 5, the left is the original profile before the transformation by the contraction and weighting process, and the right is the transformed profile, which avoids long tail distribution.

According to the description of fig. 2 to 5, the target attribute values such as the object heat value, the object behavior similarity, the object attribute similarity, the preference of the user to the object and the like can be obtained through offline processing of the underlying data.

The individual recall policies and their specific implementations are described below. First, a decay strategy for similarity. Considering that the preference of a user for a commodity decays with time, a decay function is established, t _max represents the time when the user last happens, t represents the time when the historical happens, and the decay factor A _t at the time t is as follows:

A_t＝1-tanh(t_max,T_a)；

Wherein T _a is a parameter of the activation function tanh.

Secondly, the user buys the commodity again. FIG. 6 is a distribution diagram of the probability density of repurchase in accordance with one embodiment of the invention. In the embodiment of the invention, each category is taken as a statistical object, the attenuation effect generated by different categories on the user is different, a Gaussian mixture model GMM is established by counting data samples of the two access time intervals of the user on the commodity of each category, and the probability density function of the repurchase time is estimated according to an EM (Expectation-Maximization algorithm) algorithm:

wherein,

FIG. 7 is a schematic diagram of the implementation of a real-time recall algorithm according to one embodiment of the present invention. Referring to fig. 7, S _t、S_t+1、S_t+2 shows a time sequence of behavior of a user on a commodity, a _t、T_t、F_t represents an attenuation factor, a behavior type factor and a repurchase probability factor, respectively, and the 3 factors are weighted and summed and tanh activated to obtain the attention weight of the user on the commodity, and then commodity recall is performed from the candidate commodity set S _k, specifically, the attention weight and similarity values of all behavior sequences are multiplied and accumulated and multiplied by an activation function tanh to obtain the final behavior recommendation weight of the candidate commodityAnd attribute recommendation weights W _ck for candidate items:

W_ck＝tanh(∑_j∈SH_j*M_s,T)；

Wherein M _s＝tanh(W_t*T_t+W_A*A_t+W_F*F_t,T);W_t represents the attention weight of the behavior, T _t represents the preference score corresponding to the behavior at time T, W _A represents the attention weight of the decay factor, A _t represents the decay factor, W _F is the attention weight of the repurchase probability factor, F _t is the repurchase probability factor, M _s is the attention weight of the browsed commodity s, and SIM (k, j) is the commodity behavior similarity of the browsed commodity k with other commodities.

In the embodiment of the present invention, since the popularity recommendation weight and the preference recommendation weight of the candidate commodity are less important, they may be set to fixed values in advance.

And finally, multiplying the hotness recommendation weight, the behavior recommendation weight, the attribute recommendation weight and the preference recommendation weight with the corresponding recall data respectively, and sequencing to obtain a recommendation result. Specifically, the four parts of data, i.e., the real-time behavior recall data I _b, the real-time content recall data I _c, the user attribute regression data I _r and the all-product hot spot data I _h, are weighted and ranked, and a final recommendation result is obtained. The specific process is as follows:

(I_b1,I_b2,...)*W_b,(I_c1,I_c2,...)*W_c,(I_r1,I_r2,...)*W_r,(I_h1,I_h2,...)*W_h→I_b1*W_b,I_b2*W_b,…,I_c1*W_c,I_c2*W_c,…,I_r1*W_r,I_r2*W_r,…→I_b1*W_b,I_c2*W_c,I_r1*W_r,I_h1*W_h,I_b2*W_b,….

FIG. 8 is a logic diagram of an implementation of a recall algorithm according to one embodiment of the present invention. There are two cases where the user browses the store, namely, the user browses the first page of the store, and the user browses the single page of the store. If the user browses the single product page, 4 target attribute values, namely an object heat value, an object behavior similarity, an object attribute similarity and a preference of the user to the object, are calculated by processing four data of the behavior similarity, the product class, the user attribute and the hot spot, and recall commodities are calculated; if the user browses the shop front page, processing according to the 3 data of the commodity category, the user attribute and the hot spot to calculate 3 target attribute values of the object heat value, the object attribute similarity and the preference of the user to the object, and calculating the recalled commodity.

As shown in fig. 8, if the user enters the single item page, commodity recall is performed according to the behavior similarity to obtain recall data in the item class, and recall data obtained according to the behavior similarity is stored. And then, obtaining recall data corresponding to the object heat value, the object attribute similarity and the preference of the user to the object, and carrying out weighted summation, sorting and other processing on all recall data, thereby obtaining a recommendation result. If the user does not enter the single product page and enters the shop front page, firstly inquiring whether the commodity class of the previous shop exists in the shop, if so, recalling the same hot spot data as the commodity class of the previous shop and storing the commodity recall data; if not, the history recall data is obtained. And then, obtaining recall data corresponding to the object heat value, the object attribute similarity and the preference of the user to the object, and carrying out weighted summation, sorting and other processing on all recall data, thereby obtaining a recommendation result.

FIG. 9 is a schematic diagram of a weight tensor model of one embodiment of the present invention. Considering that stores have different commodity sku numbers, different visitor numbers and different types, the three variables are scattered, weight vectors are distributed under different discrete units, and tensors in m, n, l and h dimensions can be obtained. In fig. 9, the number of skus, the number of visitors, and the type of store are equally divided into matrix weight tensors of 4 x 4, each set of weight vectors is 5-dimensional [ T _a,W_b,W_c,W_r,W_h ], and the initialization is a random parameter of 0-1.

The weight tensor model can be established according to the following steps:

1. ordering all store ids according to the quantity of skus, and equally dividing m groups;

2. Sorting each grouping of the number of skus according to the number of visitors, and equally dividing n groups;

3. for each group of visitor numbers, the groups are equally divided by store type.

In order to improve the click rate of the recommendation system (used for measuring the conversion rate of the recommendation result), the weight tensor is optimized, and the data of the click rate of the recommendation algorithm is completed statistically only after a T time window, so that a group optimization strategy is used.

In the embodiment of the present invention, g particles are used, i.e., all stores are divided into g groups, and the grouping is performed as follows.

(1) The tensor unit corresponding to each store id is obtained, and the tensor unit is obtained by encoding according to the group number of the sku number and the visitor number and the store type: id _s = i n l + j l + k;

(2) Uniformly grouping the store ids in each tensor unit to obtain group ids corresponding to each store: id _g＝idex_s% g;

(3) According to the steps, g groups of weights can be obtained, the weight vector of each group is the splicing and synthesizing of all tensor unit weights, the synthesized vector dimension is m x n x l x 5, and the synthesized vector is:

v＝[T_a,W_b,W_c,W_p,W_h]*M*N*L；

(4) Initializing g to be combined into a vector v;

(5) Counting click rate corresponding to the g group vector in the period;

(6) Update g is combined into vector v:

v＝V_rand+(V_b-V_rand)*Lr_b+(V_g-V_rand)*Lr_g；

Wherein, V _b represents a local optimal solution in the g group in one round of optimization, V _g represents a global optimal solution in one round of optimization, lr _b represents a learning rate of the local optimal solution, lr _g represents a learning rate of the global optimal solution, and V _rand represents a random vector;

(7) Repeating the steps (5) and (6) until the difference between the click rates of the two periods is smaller than a set threshold value so as to optimize the click rate. Since the data statistics period of the click rate is relatively long, generally 1 day, more than 100 is desirable according to the condition g, so that the convergence rate of the model can be increased.

Fig. 10 is a schematic diagram of main modules of a training apparatus of a recommendation model according to an embodiment of the present invention. As shown in fig. 10, the training device 1000 of the recommendation model according to the embodiment of the present invention mainly includes an offline processing module 1001, a real-time recall module 1002, a result conversion module 1003, and a parameter adjustment module 1004.

The offline processing module 1001 is configured to perform offline calculation on the underlying data to obtain a target attribute value;

The real-time recall module 1002 is configured to calculate a recommendation result in real time based on an attention mechanism and display the recommendation result according to the target attribute value and a preset recall policy;

A result conversion module 1003, configured to calculate a recommendation result conversion rate according to a behavior of the user on the recommendation result;

and the parameter adjustment module 1004 is configured to perform parameter adjustment based on a group optimization strategy according to the recommendation result conversion rate so as to train a recommendation model.

According to one embodiment of the invention, the target attribute value includes not less than one of: object hotness value, object behavior similarity, object attribute similarity, and user preference for objects.

According to another embodiment of the invention, the object behavior similarity is calculated by the following method:

counting object sets of the user after behavior;

counting the number of users who have performed actions on two objects in the object set at the same time, the user set corresponding to each object and the preference of the users to each object;

According to yet another embodiment of the invention, the user's preference for each object is obtained by weighted averaging the user's preferences for different behaviors of each object.

According to yet another embodiment of the present invention, the training apparatus 1000 of the recommendation model further includes a data adjustment module (not shown in the figure) for:

After calculating the second similarity between the objects, the second similarity is contracted and weighted.

According to an embodiment of the present invention, the recall policy may include, for example: attenuation policies, repurchase policies, and behavior type policies.

In accordance with yet another embodiment of the present invention, the real-time recall module 1002 may also be configured to, when calculating the recommendation based on the attention mechanism in real-time:

Multiplying the attention weight of the candidate object with the object behavior similarity and the object attribute similarity of the candidate object respectively, accumulating and multiplying the attention weight and the object behavior similarity and the object attribute similarity by an activation function to obtain the behavior recommendation weight and the attribute recommendation weight of the candidate object;

According to yet another embodiment of the present invention, the parameter adjustment module 1004 may be further configured to, when performing parameter adjustment based on a group optimization strategy:

Constructing a weight tensor model corresponding to a store;

acquiring tensor units corresponding to each store;

According to yet another embodiment of the present invention, the weight tensor model uniformly quantizes a store into a grid model of a preset dimension according to the preset dimension; wherein each mesh model has a multidimensional weight vector.

According to the technical scheme of the embodiment of the invention, the target attribute value is obtained by performing offline calculation on the bottom data; calculating a recommendation result in real time based on an attention mechanism according to the target attribute value and a preset recall strategy, and displaying the recommendation result; calculating a recommendation result conversion rate according to the behavior of the user on the recommendation result; according to the recommendation result conversion rate, parameter adjustment is carried out based on a group optimization strategy to train a recommendation model, offline analysis of bottom data can be achieved, recommendation results are calculated in real time based on a behavior sequence of an attention mechanism, the conversion rate of the recommendation results is used as feedback to be introduced into training of the recommendation model, the recommendation model is trained through real-time online learning, rules of change of user preference can be learned timely, and the recommendation model is high in interpretability. In addition, according to the training method of the recommendation model, the action of a plurality of behaviors is considered in the calculation of the scoring preference, so that the recommendation result is comprehensively covered; when different types of shops are recommended by using the recommendation model, the model can adaptively adjust parameters according to the characteristics of the shops, so that the difference of different types of shops recommendation strategies is reflected; because the store-crossing behavior of the user is considered in the recommending process, the store-crossing data can be recommended; the recommendation model uses various algorithms, so that the advantages and disadvantages of various recommendation algorithms are complemented, and the recommendation result is more scientific.

FIG. 11 illustrates an exemplary system architecture 1100 of a training method of a recommendation model or a training apparatus of a recommendation model to which embodiments of the present invention may be applied.

As shown in fig. 11, system architecture 1100 may include terminal devices 1101, 1102, 1103, a network 1104, and a server 1105. Network 1104 is the medium used to provide communication links between terminal devices 1101, 1102, 1103 and server 1105. Network 1104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 1105 via the network 1104 using the terminal devices 1101, 1102, 1103 to receive or transmit messages, etc. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, and the like (by way of example only) may be installed on terminal devices 1101, 1102, 1103.

The terminal devices 1101, 1102, 1103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 1105 may be a server that provides various services, such as a background management server (by way of example only) that provides support for shopping-type websites browsed by users using the terminal devices 1101, 1102, 1103. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.

It should be noted that, the training method of the recommendation model provided in the embodiment of the present invention is generally executed by the server 1105, and accordingly, the training device of the recommendation model is generally set in the server 1105.

It should be understood that the number of terminal devices, networks and servers in fig. 11 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 12, there is illustrated a schematic diagram of a computer system 1200 suitable for use in implementing a terminal device or server in accordance with an embodiment of the present invention. The terminal device or server shown in fig. 12 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present invention.

As shown in fig. 12, the computer system 1200 includes a Central Processing Unit (CPU) 1201, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data required for the operation of the system 1200 are also stored. The CPU 1201, ROM 1202, and RAM 1203 are connected to each other through a bus 1204. An input/output (I/O) interface 1205 is also connected to the bus 1204.

The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1208 including a hard disk or the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. The drive 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 1210 so that a computer program read out therefrom is installed into the storage section 1208 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1209, and/or installed from the removable media 1211. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 1201.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described units or modules may also be provided in a processor, for example, as: the processor comprises an offline processing module, a real-time recall module, a result conversion module and a parameter adjustment module. The names of these units or modules do not in some way limit the units or modules themselves, and for example, an offline processing module may also be described as "a module for offline computing underlying data to obtain a target attribute value".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: offline calculation is carried out on the bottom layer data to obtain a target attribute value; calculating a recommendation result in real time based on an attention mechanism according to the target attribute value and a preset recall strategy, and displaying the recommendation result; calculating a recommendation result conversion rate according to the behavior of the user on the recommendation result; and according to the recommendation result conversion rate, performing parameter adjustment based on a group optimization strategy to train a recommendation model.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method for training a recommendation model, comprising:

Offline calculation is carried out on the bottom layer data to obtain a target attribute value;

calculating a recommendation result in real time based on an attention mechanism according to the target attribute value and a preset recall strategy, and displaying the recommendation result;

calculating a recommendation result conversion rate according to the behavior of the user on the recommendation result;

according to the recommendation result conversion rate, parameter adjustment is carried out based on a group optimization strategy so as to train a recommendation model;

Parameter adjustment based on the group optimization strategy includes: constructing a weight tensor model corresponding to a store; acquiring tensor units corresponding to each store; uniformly grouping shops in each tensor unit to obtain a subgroup identifier corresponding to each shop so as to obtain a weight vector corresponding to each subgroup; counting click rate corresponding to the weight vector corresponding to each subgroup in the period; updating the weight vector corresponding to each group until the conversion rate of the recommended result reaches the maximum;

The weight tensor model uniformly quantizes a shop into a grid model of a preset dimension according to the preset dimension; wherein each mesh model has a multidimensional weight vector.

2. The method of claim 1, wherein the target attribute value comprises at least one of: object hotness value, object behavior similarity, object attribute similarity, and user preference for objects.

3. The method according to claim 2, wherein the object behavior similarity is calculated by:

counting object sets of the user after behavior;

4. A method according to claim 3, wherein the user's preference for each object is obtained by a weighted average of the user's preferences for different behaviors of each object.

5. A method according to claim 3, further comprising, after calculating the second similarity between the objects:

and performing contraction and weighting processing on the second similarity.

6. The method of claim 1, wherein the recall policy comprises: attenuation policy, repurchase policy, and behavior type policy of user preference to objects.

7. The method of claim 1 or 6, wherein calculating the recommendation in real time based on the attention mechanism comprises:

8. A training device for a recommendation model, comprising:

The offline processing module is used for performing offline calculation on the bottom layer data to obtain a target attribute value;

the real-time recall module is used for calculating a recommendation result in real time based on an attention mechanism according to the target attribute value and a preset recall strategy and displaying the recommendation result;

the result conversion module is used for calculating a recommendation result conversion rate according to the behavior of the user on the recommendation result;

the parameter adjustment module is used for carrying out parameter adjustment based on a group optimization strategy according to the recommendation result conversion rate so as to train a recommendation model;

The parameter adjustment module is further used for: constructing a weight tensor model corresponding to a store; acquiring tensor units corresponding to each store; uniformly grouping shops in each tensor unit to obtain a subgroup identifier corresponding to each shop so as to obtain a weight vector corresponding to each subgroup; counting click rate corresponding to the weight vector corresponding to each subgroup in the period; updating the weight vector corresponding to each group until the conversion rate of the recommended result reaches the maximum; the weight tensor model uniformly quantizes a shop into a grid model of a preset dimension according to the preset dimension; wherein each mesh model has a multidimensional weight vector.

9. An electronic device for training a recommendation model, comprising:

One or more processors;

storage means for storing one or more programs,

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.

10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.