Content of the invention
Technical problem:The technical problem to be solved is:One kind is provided to go through based on user
The commodity information recommendation method of history behavior and system, the method and system are based on user's history behavior
Data, precisely analyzes the behavioral data of user, provides the user the commercial product recommending list of personalization,
And commercial product recommending is more accurate.
Technical scheme:In order to solve the above problems, the embodiment of the present invention adopts following technical side
Case:
In a first aspect, the present embodiment provides a kind of merchandise news based on user's history behavior to recommend
Method, the method comprises the following steps:
In e-commerce website historical behavior data, historical behavior data includes S11 collection user
User profile and merchandise news;
S12, according to historical behavior data, sets up user's commodity probabilistic forecasting characteristic vector;
S13, according to user's commodity probabilistic forecasting characteristic vector, training pattern, obtains user and recommends
Commodity projection model;
S14 by user data input user's Recommendations forecast model to be predicted, go by measuring and calculating
Prediction purchase probability for commodity;
S15 buys generally according to the prediction purchase probability of behavior commodity, the prediction calculating associated articles
Rate, merges behavior commodity and associated articles, obtains commercial product recommending list.
In conjunction with a first aspect, as the first mode in the cards, in described S11, going through
History behavioral data data under PC end, WAP end, APP end, line;User profile includes
The mark ID of user, the sex of user, age, access preference;Merchandise news includes commodity
Identification code, commodity flows feature, commodity behavior characteristicss and commodity cost of decision making.
In conjunction with a first aspect, as second mode in the cards, in described S12, building
Vertical user's commodity probabilistic forecasting characteristic vector specifically includes:
S201 carries out data cleansing:By abnormal data and do not meet the data that user browses custom,
It is carried out;
S202 carries out characteristic processing, obtains user characteristicses value:Each terminal use after cleaning is gone through
History behavior characteristicss, daily count, respectively structuring user's historical behavior characteristic statisticses function, to system
Meter natural law is divided into M segment, to each segment according to time attenuation function, calculates area
Between section eigenvalue, add up each segment eigenvalue, obtain user characteristicses value:
S203 sets up user's commodity probabilistic forecasting characteristic vector:User's commodity probabilistic forecasting feature
Vector representation be:The fingerprint ID+ commodity ID+ user characteristicses vector value of each terminal;Its
In, fingerprint ID represents mark ID of user, and commodity ID represents the identification code of commodity.
In conjunction with a first aspect, as the third mode in the cards, described S15 specifically wraps
Include:
S301 determines associated articles:According to the access history data in user's history behavioral data
And purchase history data, using correlation rule or collaborative filtering, calculate behavior commodity
Associated articles the degree of association, take b commodity before degree of association highest, as behavior commodity
Associated articles set;
S302 calculates the purchase probability of associated articles according to formula (1):
Score_i=Master_Pos*SKU_Score_i/max (SKU_Score_i)
Formula (1)
Wherein, Score_i represents the purchase probability of associated articles;Master_Pos represents capable
For commodity purchasing probability;Max (SKU_Score_i) represents degree of association highest in associated articles set
Value, SKU_Score_i represents the degree of association of associated articles SKU_i and behavior commodity;
S303 merges behavior commodity and associated articles, generates commercial product recommending list:If behavior business
Product in the associated articles set that step S301 obtains, then according to behavior commodity and associated articles
Prediction purchase probability size sequence, obtain commercial product recommending list;If behavior commodity are not in step
In the associated articles set that S301 obtains, and the prediction purchase probability of behavior commodity is less than probability
Threshold value, then be multiplied by penalty coefficient using behavior commodity projection purchase probability final as behavior commodity
Prediction purchase probability;By associated articles and behavior commodity according to commodity projection purchase probability size
Sequence, obtains commercial product recommending list.
In conjunction with a first aspect, as the 4th kind of mode in the cards, described is gone through based on user
The commodity information recommendation method of history behavior, also includes step S16:The commodity that S15 obtains are pushed away
Recommend list, filtered and exported logical process, generate final commercial product recommending list.
In conjunction with the 4th kind of mode in the cards of first aspect, in the cards as the 5th kind
Mode, described step S16 specifically includes:Take the order commodity within nearest H days of family,
Using affiliated for order commodity commodity group as user filtering commodity group, the commodity that filtration S16 obtains push away
Recommend and in list, belong to the commodity filtering commodity group;According to commodity projection purchase probability, after filtering
Commercial product recommending list in commodity re-start sequence, generate final commercial product recommending list.
Second aspect, the present embodiment provides a kind of merchandise news based on user's history behavior to recommend
System, this system includes:
Acquisition module:For gathering user in e-commerce website historical behavior data;
Characteristic vector sets up module:For historical behavior data is gathered according to acquisition module, set up
User's commodity probabilistic forecasting characteristic vector;
Model building module;For setting up user's commodity probability of module foundation according to characteristic vector
Predicted characteristics vector, training pattern, obtain user's Recommendations forecast model;
Measuring and calculating module:For entering data in user's Recommendations forecast model, calculate behavior
The prediction purchase probability of commodity;
First generation module:Prediction for the behavior commodity according to measuring and calculating module measuring and calculating is bought generally
Rate, calculates the prediction purchase probability of associated articles, merges behavior commodity and associated articles, obtains
Commercial product recommending list.
In conjunction with second aspect, as the first mode in the cards, described acquisition module is adopted
The historical behavior Data Source collecting data under PC end, WAP end, APP end, line.
In conjunction with second aspect, as second mode in the cards, described characteristic vector is built
Formwork erection block includes:
Cleaning submodule:For by abnormal data and do not meet the data that user browses custom, carrying out
Cleaning;
Measuring and calculating submodule:For by each terminal use's historical behavior feature after cleaning, daily uniting
Meter, structuring user's historical behavior characteristic statisticses function respectively, M area is divided into statistics natural law
Between section, to each segment according to time attenuation function, calculate the eigenvalue of segment, add up
The eigenvalue of each segment, obtains user characteristicses value:
Setting up submodule:For setting up user's commodity probabilistic forecasting characteristic vector, user's commodity are general
Rate predicted characteristics vector representation be:The fingerprint ID+ ID+commodity ID+ of each terminal
User characteristicses value;Wherein, fingerprint ID represents mark ID of user, and ID represents user
Fingerprint, commodity ID represents the identification code of commodity.
In conjunction with second aspect, as the third mode in the cards, described the first generation mould
Block includes:
Determination sub-module:For according to the access history data in user's history behavioral data and purchase
Buy historical data, using correlation rule or collaborative filtering, calculate the pass of behavior commodity
The degree of association of connection commodity, takes b commodity before degree of association highest, as the pass of behavior commodity
Connection commodity set;
Calculating sub module:For calculating the purchase probability of associated articles;
First generation submodule:For merging behavior commodity and associated articles, generate commercial product recommending
List:If behavior commodity are in the associated articles set that determination sub-module is set up, according to behavior
The prediction purchase probability size sequence of commodity and associated articles, obtains commercial product recommending list;If OK
For commodity not in the associated articles set that determination sub-module is set up, and the prediction purchase of behavior commodity
Buy probability be less than probability threshold value, then behavior commodity projection purchase probability is multiplied by penalty coefficient as
The final prediction purchase probability of behavior commodity;Will be pre- according to commodity to associated articles and behavior commodity
Survey the sequence of purchase probability size, obtain commercial product recommending list.
In conjunction with second aspect, as the 4th kind of mode in the cards, described is gone through based on user
The merchandise news commending system of history behavior, also includes the second generation module:For generating to first
The commercial product recommending list that module obtains, is filtered and is exported logical process, generates final business
Product recommendation list.
In conjunction with the 4th kind of mode in the cards of second aspect, as the 5th kind of side in the cards
Formula, the second described generation module includes:
Filter commodity group setting up submodule:For taking the order commodity within nearest H days of family,
Using affiliated for order commodity commodity group as user filtering commodity group;
Filter submodule:Belong to for filtering in the commercial product recommending list that the first generation module generates
Filter the commodity of commodity group;
Second generation submodule:According to commodity projection purchase probability, after filter submodule is filtered
Commercial product recommending list in commodity re-start sequence, generate final commercial product recommending list.
Beneficial effect:Compared with prior art, provided in an embodiment of the present invention based on user's history
The commodity information recommendation method of behavior and system, the commodity that can provide the user personalization push away
Recommend, and recommend more accurate, meet user's request.The present embodiment based on user's history behavior
Commodity information recommendation method be analyzed based on the historical behavior data of user, build user push away
Recommend commodity projection model, and the associated articles related to behavior commodity are also included Recommendations
In list, after the purchase probability of Integrated comparative behavior commodity and associated articles, generate commercial product recommending
List.
Specific embodiment
Below in conjunction with the accompanying drawings, detailed explanation is carried out to the technical scheme of the embodiment of the present invention.
As shown in figure 1, a kind of merchandise news based on user's history behavior of the present embodiment is recommended
Method, comprises the following steps:
S11 collects user in e-commerce website historical behavior data, and historical behavior data includes
User profile and merchandise news;
S12, according to the historical behavior feature of the attribute, product features and user of user, sets up and uses
Family commodity probabilistic forecasting characteristic vector;
S13, according to user's commodity probabilistic forecasting characteristic vector, training pattern, show that user recommends
Commodity projection model;
S14 enters data in user's Recommendations forecast model, draws the prediction of behavior commodity
Purchase probability;
S15 buys generally according to the prediction purchase probability of behavior commodity, the prediction calculating associated articles
Rate, merges behavior commodity and associated articles, obtains commercial product recommending list.
In above-mentioned recommendation method, using user in the historical behavior data of e-commerce website, survey
The prediction purchase probability of calculation behavior commodity, and behavior commodity and associated articles are combined, raw
Become commercial product recommending list.Because the behavior of different user is different, so being based on different user
Historical behavior, measuring and calculating behavior commodity prediction purchase probability so that the commodity ultimately generating push away
Recommend list and there is personalization, generate different commercial product recommending lists for different user.
Items list for making to recommend more meets the demand of user, in step S11, history row
For Data Source under PC end, WAP end, APP end and line data.The Data Source of multiple terminals,
Be conducive to the historical behavior data acquisition range that extends one's service so that the historical behavior data of collection
More accurately react the historic demand of user, the generation for subsequent article recommendation list provides more
Accurately historical data basis.The historical behavior data class of collection can be true according to actual needs
Fixed, including user profile and merchandise news.For example, user tag information, user access information,
The duration that user's click information, user browse, user search for information, user's collection, shopping
Car information, presell information, order sales information etc..User profile include ID ID,
Customer attribute information etc..Merchandise news includes commodity sign coding, product features information etc..With
Family attribute information includes the sex of user, age, accesses preference.Wherein, access preference reaction
The hobby of user, such as color, style etc..Determine that the attribute of user can be according to historical behavior
Data, is modeled identification with methods such as statistical analysiss and machine learning to it and draws.Commodity
Feature includes commodity flows feature, commodity behavior characteristicss and commodity cost of decision making.Commodity flows are special
Levy and refer to:PV, UV, conversion ratio, sales volume, quantity on order, sales volume rate of increase, order increase
Rate etc..Commodity behavior characteristicss refer to:Sales promotion, price reduction, new product, presell reservation, quick-fried money commodity,
Sales promotion dynamics, price etc..Commodity cost of decision making refers to:Buy the decision-making time of commodity, browse
Number of times, browse natural law etc..The historical behavior feature building user refers to:History row to user
It is analyzed for data, draw the factor that impact user buys, according to the other extraction factor of Factor minute
Eigenvalue, composing factor numerical value vector, obtain the historical behavior feature of user.
Preferably, as shown in Fig. 2 in described S12, setting up user's commodity probabilistic forecasting
Characteristic vector specifically includes:
S201 carries out data cleansing:Abnormal data is carried out.
So-called abnormal data refers to compared with other data, and this data is significantly different, abnormal
Or inconsistent data.The for example following data needing to filter, broadly falls into abnormal data:
Filter same user and add shopping cart merchandise classification number>The user of merchandise classification threshold value Na;Filter
Browsing time is less than the commodity details page browsing record of browsing time threshold value Nbs;When filtration browses
Between more than browsing time threshold value Ncs commodity details page browsing record;If used in a session
Family level Four page browsing quantity is more than level Four page browsing quantity threshold value Nd, then filter this meeting
Words;User's same day accesses pv and is less than pv threshold value Ne, filters this user.
Except abnormal data, the data not meeting user and browsing custom can also be carried out,
I.e.:By abnormal data with do not meet user and browse the data of custom and be carried out.So-called do not meet
The data that user browses custom refers to the data very big with the behavior difference of normally shopping user, example
Navigation patterns as reptile user or brush single user.
S202 carries out characteristic processing:According to the distribution of each terminal use's historical behavior feature, press
Its statistics, constructs the function as shown in formula (2) respectively:
Formula (2)
Wherein, f (X) represents user's history behavior characteristicss statistical function, and X represents characteristic variable,
A represents each characteristic threshold value, and x represents the statistical number of characteristic variable X.
If statistics natural law is N days, M segment is divided into statistics natural law, each segment
According to the time attenuation function shown in formula (3), calculate the eigenvalue of this segment;
Formula (3)
Wherein, K represents the half-life of attenuation function, and t represents the natural law calculated apart from this,
As calculate the previous day eigenvalue when, t=1, when calculating eigenvalue a few days ago, t=2;
The eigenvalue of each segment that decays according to formula (3), the cumulative feature drawing final user
Value:
Formula (4)
Wherein, N represents the statistics natural law of historical behavior data, and Nt represents 1:The integer of N/M
Sequence;
S203 sets up user's commodity probabilistic forecasting characteristic vector:User's commodity probabilistic forecasting feature
Vector representation be:The fingerprint ID+ commodity ID+ user characteristicses vector value of each terminal.
Fingerprint ID represents mark ID of user.Such as cookieid, MEMI, member's coding etc..
Commodity ID represents the identification code of commodity.
With user's commodity probabilistic forecasting characteristic vector according to the difference of each terminal use, and build respectively
Vertical, specifically:
(1) pc user:Fingerprint ID (PC)+commodity ID+ behavior characteristicss;
(2) WAP user:Fingerprint ID (WAP)+commodity ID+ behavior characteristicss;
(3) APP user:Fingerprint ID+ commodity ID+ behavior characteristicss;
(4) across screen user:Fingerprint ID1 (PC)+fingerprint ID2 (WAP)+fingerprint ID3+
Commodity ID+ behavior characteristicss.
Wherein, fingerprint ID (PC) represents pc user mark ID;Fingerprint ID (WAP) represents
WAP ID ID;Fingerprint ID represents APP ID ID.
In step s 13, according to user's commodity probabilistic forecasting characteristic vector, training pattern, obtain
Go out user's Recommendations forecast model.
The model of training is according in logistic regression, lasso recurrence, random forests
Any one or more method and set up.During training, the multiterminal such as PC end, WAP end, APP end
Data, is trained model respectively.Take the user of the commodity having been converted into order in shopping cart
Commodity probabilistic forecasting characteristic vector, as training set positive sample data.Take in behavior and do not convert
For user's commodity probabilistic forecasting characteristic vector of the SKU of order, as the anti-sample number of training set
According to.The model training being related in the present embodiment, calculates each commodity using learning classification model
Purchase probability, including logistic regression, lasso recurrence, random forests etc..
Logic Regression Models:In the case of classification, the LR grader after study obtains
To one group of weights, weights according to linear with training data plus and mode, obtain a weighted value,
Go out its probability according to the form calculus of sigmoid function afterwards, that is, obtain purchase probability.
Lasso regression model:Lasso(Least absolute shrinkage and
Selection operator, Tibshirani) method is a kind of Shrinkage estimation.It passes through construction
One penalty function obtains the model of a more refine so that it compresses some coefficients, sets simultaneously
Some coefficients fixed are zero.The advantage therefore remaining subset contraction, is that a kind of process has again altogether
Linear data biased estimation.The basic thought of Lasso is the absolute value sum in regression coefficient
Under constraints less than a constant, residual sum of squares (RSS) is made to minimize such that it is able to produce certain
A little regression coefficients exactly equal to 0, obtain the model that can explain.Make prediction probability more
Plus accurately.
Random forests model:Random forest is a classification comprising multiple decision trees
Device, and depending on the classification of its output is the mode by the classifications of indivedual tree outputs.According to output
Classification calculates the purchase probability of user.
In step S14, user data to be predicted is loaded in user's commodity projection model,
Draw the prediction purchase probability of behavior commodity.
Preferably, S15 specifically includes following steps:
S301 determines associated articles:According to the access history data in user's history behavioral data
And purchase history data, using correlation rule or collaborative filtering, calculate behavior commodity
Associated articles the degree of association, take b commodity before degree of association highest, as behavior business
The associated articles set of product, b is integer, and b>1.
S302 calculates the purchase probability of associated articles according to formula (1):
Score_i=Master_Pos*SKU_Score_i/max (SKU_Score_i) formula
(1)
Wherein, Score_i represents the purchase probability of associated articles;Master_Pos represents capable
For commodity purchasing probability;Max (SKU_Score_i) represents degree of association highest in associated articles set
Value, SKU_Score_i represents the degree of association of associated articles SKU_i and behavior commodity;
S303 merges behavior commodity and associated articles, obtains the recommendation list of commodity:
If behavior commodity are in the associated articles set that step S301 obtains, according to behavior business
The prediction purchase probability size sequence of product and associated articles, draws commercial product recommending list;If behavior
Commodity are not in the associated articles set that step S301 obtains, and the prediction of behavior commodity is bought
Probability is less than probability threshold value, then behavior commodity projection purchase probability is multiplied by penalty coefficient as row
For the prediction purchase probability that commodity are final, by associated articles and behavior commodity, according to commodity projection
Purchase probability size is resequenced, and obtains commercial product recommending list.
In step S303, probability threshold value and penalty coefficient are according to comprehensive evaluation index
(F-Measure) optimization criteria is selected, and takes the probability when F-Measure is maximum
Threshold value and penalty coefficient.
Wherein:The individual sum of the individual sum of hit rate=correct identification/identify;
Individual sum present in the individual sum/test set of recall rate=correct identification;
In the case that contradiction in hit rate and recall rate index, using comprehensive evaluation index
(F-Measure is also called F-Score) considers them, chooses optimal value.F-Measure
It is hit rate and recall rate weighted harmonic mean.
F-Measure=(1+a2) * hit rate * recall rate/a2* (hit rate+recall rate);
When parameter a=1 it is simply that modal F1, namely F1=2* hit rate * recall rate/(life
Middle rate+recall rate).
Understand, F1 combines the result of hit rate and recall rate.When F1 is higher, then illustrate
Method is more effective.F1 Main Function is adjustment sequence.
As shown in figure 4, the recommendation method that the present embodiment provides is on the basis of above-described embodiment,
Increased step S16:The commercial product recommending list being obtained according to S15, according to behavior filter logic
Filtered and exported logical process, exported final commercial product recommending list.
The detailed process being filtered and being exported logical process according to behavior filter logic is:Take
Order commodity within nearest H days of family, using affiliated for order commodity commodity group as user filtering business
Product group, filters and belongs to the commodity filtering commodity group in the commercial product recommending list that S16 obtains;According to
Commodity projection purchase probability, the commodity in the commercial product recommending list after filtering are re-started row
Sequence, as final commercial product recommending list.Order has been descended within nearest H days based on user
Commodity, user would not buy in the recent period again, therefore, processed using behavior filter logic, make
Obtain the business no longer occurring in commercial product recommending list finally having played order within nearest H days
Product.Eliminate the commercial product recommending list after these commodity, more can accurately reflect the demand of user.
The recommendation method of above-described embodiment, considered user behavior and product features and the two
Cross feature, improve the accuracy of prediction, further increase the accuracy of recommendation.Hand over
Fork feature refers to the linear of characteristic attribute or nonlinear combination.Cross feature is to user behavior
Portray abundanter, the dimension of the characteristic variable of increase, further increase the precision of model.
Carry out accuracy test:Comparative example and the present embodiment, both test datas are all using this
The acquisition modes of training data in embodiment, are calculated in new time window.Comparative example is adopted
The model set up using step S13 with Logic Regression Models, the present embodiment.When calculating, right
The user characteristicses that ratio adopts are the navigation patterns of user, and the user characteristicses that the present embodiment adopts are
User behavior feature, product features.Comparative example and the present embodiment export through model measurement and recommend
List.Predicting the outcome according to both, the precision of prediction AUC of comparative example is 0.70, this reality
The precision of prediction AUC applying example is 0.83.
Overall dimensions are ranked up to recommendation results, for the core evaluation index recall rate recommended
Situation about can conflict with hit rate, the present embodiment adopts the statistics of aggregative weighted harmonic average
Method is weighed, and finally recommends ranking results using its optimal value optimization, comments according to comprehensive
The corresponding result of valency index maximum is ranked up, and improves the accuracy recommending sequence,
The method employing multi-level decay, sets up the historical behavior feature of user.By user's
Historical behavior is divided into M segment, is decayed in segment and two dimensions of time.Make
Remain the custom that continuously browses of user with the method, the method thinks the use in same segment
Family behavior is Continuous behavior, and considers the impact of the purchasing demand to user for the time.Multilamellar
Final spy can be caused in the commodity score of the method impact sequence of secondary decay, the speed of decay and interval
Levy vector value different, lead to user's score different.It is according to score because the commodity of user sort
Big minispread, therefore the result of score Different Effects sequence (recommending ranking results).
The present embodiment method prediction user carries out purchase probability prediction to ecommerce commodity, prediction
Result carries out the basic forecast data such as precision marketing, personalized recommendation as e-commerce website.
In addition, a kind of being pushed away based on the merchandise news of user's history behavior as shown in figure 5, also providing
Recommend system, this system includes:
Acquisition module:For gathering user in e-commerce website historical behavior data;
Characteristic vector sets up module:For historical behavior data is gathered according to acquisition module, set up
User's commodity probabilistic forecasting characteristic vector;
Model building module;For setting up user's commodity probability of module foundation according to characteristic vector
Predicted characteristics vector, training pattern, obtain user's Recommendations forecast model;
Measuring and calculating module:For entering data in user's Recommendations forecast model, calculate behavior
The prediction purchase probability of commodity;
First generation module:Prediction for the behavior commodity according to measuring and calculating module measuring and calculating is bought generally
Rate, calculates the prediction purchase probability of associated articles, merges behavior commodity and associated articles, obtains
Commercial product recommending list.
In said system, using user in the historical behavior data of e-commerce website, calculate row
For the prediction purchase probability of commodity, and behavior commodity and associated articles are combined, generate business
Product recommendation list.Because the behavior of different user is different, so going through based on different user
History behavior, the prediction purchase probability of measuring and calculating behavior commodity is so that the commercial product recommending ultimately generating arranges
Table has personalization, generates different commercial product recommending lists for different user.
The historical behavior Data Source of acquisition module collection is in PC end, WAP end, APP end and line
Lower data.The Data Source of multiple terminals, is conducive to the historical behavior data acquisition model extending one's service
Enclose so that the historical behavior data gathering more accurately reacts the historic demand of user, after being
The generation of continuous commercial product recommending list provides more accurately historical data basis.Historical behavior packet
Include user profile and merchandise news.User profile includes mark ID of user, user property letter
Breath etc..Merchandise news includes the identification code of commodity, product features information etc..User property bag
Include the sex of user, age, access preference.Product features include commodity flows feature, commodity
Behavior characteristicss and commodity cost of decision making.
Preferably, as shown in fig. 6, described characteristic vector sets up module includes:
Cleaning submodule:For by abnormal data and do not meet the data that user browses custom, entering
Row cleaning;
Measuring and calculating submodule:For by each terminal use's historical behavior feature after cleaning, daily uniting
Meter, structuring user's historical behavior characteristic statisticses function respectively, M area is divided into statistics natural law
Between section, to each segment according to time attenuation function, calculate the eigenvalue of segment, add up
The eigenvalue of each segment, obtains user characteristicses value:
Setting up submodule:For setting up user's commodity probabilistic forecasting characteristic vector, user's commodity are general
Rate predicted characteristics vector representation be:The fingerprint ID++ commodity ID+ user characteristicses of each terminal
Value;Wherein, fingerprint ID represents mark ID of user, and commodity ID represents that the mark of commodity is compiled
Code.
Characteristic vector is set up in module, does not meet using cleaning submodule cleaning abnormal data and not
User browses the data of custom, then using measuring and calculating submodule measuring and calculating user characteristicses value, finally profit
Set up user's commodity probabilistic forecasting characteristic vector with setting up submodule.Wherein, calculate submodule pair
Each segment, according to time attenuation function, calculates the eigenvalue of segment, and then add up each area
Between section eigenvalue.The commodity score of the method impact sequence of multi-level decay, the speed of decay
Final characteristic vector value can be caused different with interval, lead to user's score different, due to user's
Commodity sequence is according to the big minispread of score, therefore the result of score Different Effects sequence (pushes away
Recommend ranking results).
In cleaning submodule, abnormal data refers to compared with other data, and this data is notable phase
Different, abnormal or inconsistent data.The for example following data needing to filter, broadly falls into
Abnormal data:Filter same user and add shopping cart merchandise classification number>Merchandise classification threshold value Na
User;Filter the commodity details page browsing record that the browsing time is less than browsing time threshold value Nbs;
Filter the commodity details page browsing record that the browsing time is more than browsing time threshold value Ncs;If one
In individual session, user's level Four page browsing quantity is more than level Four page browsing quantity threshold value Nd, that
Filter this session;User's same day accesses pv and is less than pv threshold value Ne, filters this user.
So-called do not meet user browse custom data refer to normally shopping user behavior poor
Not very big data, the such as navigation patterns of reptile user or brush single user.
Preferably, as shown in fig. 7, the first described generation module includes:
Determination sub-module:For according to the access history data in user's history behavioral data and purchase
Buy historical data, using correlation rule or collaborative filtering, calculate the pass of behavior commodity
The degree of association of connection commodity, takes b commodity before degree of association highest, as the pass of behavior commodity
Connection commodity set;
Calculating sub module:For calculating the purchase probability of associated articles according to formula (1):
Score_i=Master_Pos*SKU_Score_i/max (SKU_Score_i)
Formula (1)
Wherein, Score_i represents the purchase probability of associated articles;Master_Pos represents capable
For commodity purchasing probability;Max (SKU_Score_i) represents degree of association highest in associated articles set
Value, SKU_Score_i represents the degree of association of associated articles SKU_i and behavior commodity;
First generation submodule:For merging behavior commodity and associated articles, generate commercial product recommending
List:If behavior commodity are in the associated articles set that determination sub-module is set up, according to behavior
The prediction purchase probability size sequence of commodity and associated articles, obtains commercial product recommending list;If OK
For commodity not in the associated articles set that determination sub-module is set up, and the prediction purchase of behavior commodity
Buy probability be less than probability threshold value, then behavior commodity projection purchase probability is multiplied by penalty coefficient as
The final prediction purchase probability of behavior commodity;Will be pre- according to commodity to associated articles and behavior commodity
Survey the sequence of purchase probability size, obtain commercial product recommending list.
First generates in submodule, and probability threshold value and penalty coefficient are according to comprehensive evaluation index
(F-Measure) optimization criteria is selected, and takes the probability when F-Measure is maximum
Threshold value and penalty coefficient.
First generation submodule not only allows for behavior commodity it is also contemplated that associated articles, will close
Connection commodity are with behavior commodity together as commodity to be recommended.When selecting Recommendations, according to
Whether behavior commodity have existed in associated articles set, and the prediction purchase probability of behavior commodity is entered
The different process of row, by the behavior commodity after associated articles and process, again according to purchase probability
It is ranked up so that position in recommendation list for the behavior commodity more meets the demand of user.
As shown in figure 8, the described merchandise news commending system based on user's history behavior, also
Including the second generation module:For the commercial product recommending list that the first generation module is obtained, carry out
Filter and output logical process, generate final commercial product recommending list.Because user buys in the recent period
Commodity, generally will not buy again, thus to first generation module generate commercial product recommending row
Table is filtered and is exported logical process so that not having user near in final commercial product recommending list
The commodity that phase is bought, so that commercial product recommending list more meets the real demand of user.
As shown in figure 9, the second described generation module includes:
Filter commodity group setting up submodule:For taking the order commodity within nearest H days of family,
Using affiliated for order commodity commodity group as user filtering commodity group.H is integer, and H>3.
Filter submodule:Belong to for filtering in the commercial product recommending list that the first generation module generates
Filter the commodity of commodity group.
Second generation submodule:According to commodity projection purchase probability, after filter submodule is filtered
Commercial product recommending list in commodity re-start sequence, generate final commercial product recommending list.
Choose filtration commodity group by filtering commodity group setting up submodule.Filtration commodity group is user
The commodity bought in the recent period.In the commercial product recommending list that first generation module is generated by filter submodule
Belong to the commodity filtration filtering commodity group.Second generates the commercial product recommending row after submodule will filter
Commodity in table, according to commodity projection purchase probability, re-start sequence, generate final business
Product recommendation list.By above three submodule, the commercial product recommending that the first generation module is generated
In list, the commodity bought recent with user belong to similar commodity and filter out, so that finally
Commercial product recommending list in the commodity of arrangement meet the real demand of user.
Those skilled in the art should know, realizes method or the system of above-described embodiment, can
To be realized by computer program instructions.This computer program instructions is loaded into programmable data
In processing equipment, such as computer, thus execute corresponding in programmable data processing device
Instruction, the function that the method or system for realizing above-described embodiment is realized.
Those skilled in the art, according to above-described embodiment, can carry out non-creativeness to the application
Technological improvement, without deviating from the spirit of the present invention.These improvement still should be regarded as in the application
Within scope of the claims.