CN113469730A - Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene - Google Patents
Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene Download PDFInfo
- Publication number
- CN113469730A CN113469730A CN202110637643.7A CN202110637643A CN113469730A CN 113469730 A CN113469730 A CN 113469730A CN 202110637643 A CN202110637643 A CN 202110637643A CN 113469730 A CN113469730 A CN 113469730A
- Authority
- CN
- China
- Prior art keywords
- prediction
- repurchase
- lightgbm
- sample
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000004927 fusion Effects 0.000 title claims abstract description 23
- 230000006399 behavior Effects 0.000 claims abstract description 26
- 238000005457 optimization Methods 0.000 claims abstract description 21
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 13
- 238000007637 random forest analysis Methods 0.000 claims abstract description 12
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000003066 decision tree Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 2
- 238000003860 storage Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 abstract description 3
- 230000000694 effects Effects 0.000 description 15
- 238000011156 evaluation Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000013486 operation strategy Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000013102 re-test Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Software Systems (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Medical Informatics (AREA)
- Probability & Statistics with Applications (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a client repurchase prediction method and device based on an RF-LightGBM fusion model under a non-contract scene. The method comprises the following steps: acquiring historical data of a user, and performing preprocessing and characteristic engineering on the historical data; taking the data after data preprocessing as a sample, and balancing a sample set by utilizing a SMOTE-ENN method; carrying out hyper-parameter optimization on a random forest algorithm and a LightGBM algorithm through a TPE optimization algorithm to construct a weak classifier; and performing ensemble learning on the training samples through the weak classifiers to obtain a strong classifier, and obtaining a final result about the repurchase prediction. The method analyzes according to the consumption data of the clients purchased by the enterprise, accurately predicts the repurchase behavior of the existing clients, guides the client relationship management decision and the accurate marketing strategy according to the repurchase behavior, improves the marketing conversion rate and reduces the enterprise operation cost.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a customer repurchase prediction method and device based on an RF-LightGBM fusion model under a non-contract scene.
Background
With the advent of the big data age, predicting future purchasing intentions of consumers from massive historical consumer transaction data has become an important issue in enterprise management. The prediction of the client repeated purchasing behavior under the non-contract scene mainly refers to the prediction of the repeated purchasing behavior of the next time the client purchases the enterprise product under the situation that the enterprise and the client do not sign a purchase contract. The consumers with repeated purchasing intention can be accurately predicted, the customer demands can be more accurately matched through accurate marketing, the value of the new consumers is improved, and the new consumers are converted into faithful customers.
In the prior art, a chinese patent of invention (No. CN109146533B) discloses an information push method and apparatus, which specifically disclose obtaining at least two pieces of order information of a user for an item of the same item type, determining an average daily consumption of the user for an interval of the item type based on a purchase amount in the at least two pieces of order information, and determining a push date for pushing item information associated with the item of the item type to a user terminal of the user based on the average daily consumption and a purchase amount corresponding to a latest order, thereby improving effectiveness of information push. The chinese invention patent (publication No. CN108171530B) discloses a method and a device for increasing the unit price and the repurchase rate of customers, which comprises: selecting historical marketing data of a target store to obtain a historical marketing campaign effect, and obtaining a marketing campaign effect estimation initial value of the target store according to the historical marketing data and the historical marketing campaign effect; and constructing threshold adjustment factors according to the ratio of the historical marketing activities of all stores meeting the threshold order number and meeting the customer order number, calibrating the pre-estimated marketing activity effect of the target stores by using the threshold adjustment factors, and obtaining the pre-estimated value of the marketing activity effect of the target stores, thereby solving the problem that the marketing activity effect cannot be estimated more accurately according to the change of the threshold in the existing promotion activity effect evaluation technology. Although the product recommendation and the effect prediction are realized according to historical data in the prior art, the customer behavior cannot be accurately predicted.
The existing machine learning method is widely applied to the field of customer behavior prediction, but most of the existing machine learning method focuses on prediction in a shopping mall scene. In the prior art, the chinese invention application (publication No. CN110956497A) discloses a method for predicting a repeat purchasing behavior of an e-commerce platform user, comprising: the method comprises the steps of obtaining historical purchasing behavior data of a user, fusing a deep Catboost individual model, a double-layer attention BiGRU individual model and a DeepGBM individual model, modeling discrete purchasing record numerical values and behavior sequence characteristics in the historical purchasing data of the user, and improving accuracy of a prediction result. The Chinese invention application (publication number CN108520469A) discloses a user re-purchasing behavior analysis method based on an e-commerce platform, which selects effective purchasing records of users in a statistical period; carrying out data cleaning; marking a label of whether the purchase is repeated or not, a label of whether the purchase is repeated for a platform or a label of whether the purchase is repeated for a dangerous seed or not on each effective purchase record; counting the total number of purchasing users, the number of repeated purchasing users, the total number of purchasing users of each platform, the total number of repeated purchasing users of each platform, the total number of purchasing users of each dangerous type and the total number of repeated purchasing users of each dangerous type; and calculating the repeated purchase rate, the platform repeated purchase rate and the dangerous seed repeated purchase rate in the statistical time period. However, in the e-market scenario, the "implicit" feedback behavior of the customer's collection, praise, etc. can be retained, which is not available in the broader non-contract scenario. And the machine learning algorithm is mainly used for algorithm integration at present, so that the influence of the data set on the prediction result is ignored. Generally, in a purchasing situation, users who purchase repeatedly are less than users who purchase once, and thus, the problem of data category imbalance exists, which often causes overfitting of a model and causes low prediction accuracy.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a client repurchase prediction method and a client repurchase prediction device based on an RF-LightGBM fusion model under a non-contract scene, and the invention adopts the following technical scheme:
a customer repurchase prediction method based on an RF-LightGBM fusion model under a non-contract scene comprises the following steps:
acquiring historical purchase record data of a user, preprocessing the historical purchase record data and extracting features;
balancing the data subjected to the feature extraction by using a sample balancing method to obtain a balanced sample;
training sample data by using an optimization algorithm, and performing iterative optimization on the weak classifier in a specified weak classifier hyperparametric space;
performing ensemble learning to obtain a strong classifier by giving the same weight to each weak classifier;
predicting by using a strong classifier to obtain final results of product recommendation and repurchase behavior prediction;
and pushing product information to the terminal equipment of the user and/or sending a re-purchasing behavior prediction result to a management system according to the final result.
Further, the extracting features includes:
time of last purchase, frequency of purchases, total amount of purchases, duration of relationship, purchase interval.
Further, the sample equalization method comprises:
generating a few samples of the extracted features by using a SMOTE oversampling method, judging the generated samples by using an ENN (edited KNN) method, and removing the samples if the prediction result is different from the actual class label to obtain balanced samples.
Further, the optimization algorithm comprises:
and optimizing the model hyper-parameters by using a TPE (Tree-structured park Estimator) Tree-shaped park estimation optimization algorithm, and training the model under the condition of the optimal hyper-parameters.
Further, the weak analyzer comprises a random forest RF (random forest) model and a Light GBM model, the output results of the weak analyzer are classification probability values, and the mathematical expression is as follows:
in the formula, NtreeIs the total number of decision trees, hiFor the ith decision tree, P (x | y) represents the probability that the prediction sample x belongs to the class y.
Further, the ensemble learning specifically includes:
the RF model and the Light GBM model are given the same weight, and are integrated by using a Soft Voting (Soft Voting) method on the basis of the prediction probability, and the mathematical expression form is as follows:
PSoft Voting=(PRF+PLightGBM)/2
wherein, PSoft VotingPrediction probability, P, for a soft voting fusion modelRF,PLightGBMThe prediction probabilities of the random forest and the LightGBM are respectively represented, Result represents the prediction Result of the fusion model, 1 represents that the user belongs to the repurchase type, 0 represents that the user belongs to the non-repurchase type, and threshold represents the classification threshold.
And further, the method is used as a product recommendation guide based on the repurchase behavior prediction and the repurchase probability prediction.
The invention also provides a client repurchase prediction device based on the RF-LightGBM fusion model under a non-contract scene, which comprises the following components:
the acquisition module is used for acquiring historical purchase record data of a user, preprocessing the historical purchase record data and extracting features;
the balance module is used for balancing the data subjected to the feature extraction by using a sample balance method to obtain a balanced sample;
the optimization training module is used for training sample data by using an optimization algorithm and performing iterative optimization on the weak classifier in the specified hyper-parameter space of the weak classifier;
the ensemble learning module is used for performing ensemble learning to obtain a strong classifier by endowing the weak classifiers with the same weight;
the prediction module is used for predicting by using the strong classifier to obtain final results of product recommendation and repurchase behavior prediction;
and the pushing module is used for pushing the product information to the terminal equipment of the user according to the final result.
The invention also includes an electronic device comprising:
a processor, and a memory;
the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions to perform a method for forecasting customer buys under a non-contract scenario based on an RF-LightGBM fusion model as described above.
The present invention also includes a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for predicting a customer buyback based on an RF-LightGBM fusion model in a non-contract scenario as described above.
The invention achieves the following beneficial effects: analyzing according to the existing user purchasing behavior records of the enterprise, accurately predicting the existing user re-purchasing condition, and guiding a customer relationship management strategy and a marketing strategy according to the situation, so that the marketing conversion rate is improved, and the related operation cost is reduced; based on the purchasing behavior data of the customers, the re-purchasing behavior of the customers on the commodities is accurately predicted, the actual effective requirements of the customers are met, and meanwhile the enterprise communication cost can be reduced; the enterprise operation strategy is dynamically guided by the data, the data promotes decision making and assists in achieving the product marketing goal, and finally the goal of recommending a proper product to a proper user in an intelligent mode is achieved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby. It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the embodiment discloses a customer repurchase prediction method based on an RF-LightGBM fusion model in a non-contract scenario, which includes the following steps:
(1) and acquiring historical purchase record data of the user, preprocessing the historical purchase record data and extracting features. The historical purchase record data is data that already exists. The extracting features includes: time of last purchase (R), frequency of purchases (F), total amount of purchases (M), duration of relationship (S), purchase interval (T).
(2) And carrying out sample equalization on the data subjected to the feature extraction by using a SMOTE-ENN method to obtain a model training set. And (4) adopting a multi-time sampling method with replacement for each type of sample in the original sample set to form a test sample.
(3) Training the training sample data by using a TPE optimization algorithm, and performing iterative optimization on the weak classifier in the specified hyper-parameter space of the weak classifier.
(4) And assigning the same weight to each weak classifier, performing ensemble learning to obtain a strong classifier, and obtaining a final result about product recommendation and repeated purchasing behavior retest.
In this embodiment, the step of preprocessing includes: to facilitate computer processing and user tagging, the character type is converted to numerical data, and the numerical data is converted to date type data. The extracted features include a recent purchase time (R), a frequency of purchases (F), a total amount of purchases (M), a relationship duration (S), a purchase interval (T):
a) r: the last consumption time of the product by the client is as follows:
R=Tlast_time-Tplast_time
wherein T islast_timeDenotes the end time of the reference period, Tplast_timeIndicating the time of the last order transaction by the customer for the item within the reference time period.
b) F: the number of purchases made by the customer over the observation period.
c) M: the total purchase amount of the product by the customer is in the following form:
where n represents the total number of times consumed by the customer over the reference time period and M represents the amount of a single consumption by the customer.
d) S: refers to the time interval from the first transaction to the last transaction of the client occurring within the reference time, and is in the form of:
S=Tplast_time-Tpfirst_time
wherein T isplast_timeIndicates the time of the last order transaction, T, of the customer for the item within the reference time periodpfirst_timeIndicating the time of the first order trade of the customer for the item within the reference time period.
e) T: the average trade time interval over a period of time for a customer is of the form:
the invention provides a method for processing unbalanced samples by adopting a SMOTE-ENN method, which has the advantages of having good effect on the problem of two classifications of only a small number of positive samples and having better performance by comparing different methods. The SMOTE-ENN method comprises the following steps:
(1) SMOTE method (Synthetic Minrity Oversampling Technique):
let A denote a minority of classes, arbitrarily take XiE.g. A, calculating the distance from the sample to all samples in the minority class sample set A by taking the Euclidean distance as a standard to obtain XiK nearest neighbor samples, randomly selecting one sample from the nearest neighbor samples, namely Xij(j ═ 1,2,. n); at XiAnd Xij(j ═ 1, 2.. times, n) are interpolated by random linear interpolation to construct new few samples Yj:
Yj=Xi+rand(0,1)×(Xij-Xi)
In the formula, rand (0,1) represents a random number in the interval (0, 1).
(2) ENN method (Edited KNN)
And predicting each sample in the data set ND generated by the SMOTE method by using a K nearest neighbor (K is 5), and rejecting the sample if the prediction result is different from the actual class label. The Euclidean distance is selected as a measurement formula of the KNN algorithm, and the form is as follows:
in the formula, x and y represent two different users, and i represents a feature number.
Assigning a hyper-parameter configuration space of the weak classifier, and performing iterative optimization on the parameter space of the assigned weak classifier by adopting a TPE (thermal plastic article-Enn) optimization algorithm on a sample set constructed by the SMOTE-ENN method, wherein the optimization formula is as follows:
x*=arg minx∈χF(x)
wherein F (x) represents the objective function of the weak learner; x is the number of*Is the parameter at which the best results are obtained.
The TPE algorithm density is defined as:
wherein l (x) is represented by an observed value { x }iIs less than y*G (x) is the observed value { x }iAn objective function F (x) of y or more*The density composition of (a). Using y*As quantile γ for the observed value y. The Expected Improvement (EI) is:
the output result of the random forest model is the average of the probabilities of all decision trees, and the mathematical expression form is as follows:
wherein N istreeIs the total number of decision trees, hiFor the ith decision tree, P (x | y) represents the probability that the prediction sample x belongs to the class y.
The LightGBM model also outputs classification probabilities using the method described above.
The RF model and the Light GBM model are given the same weight, and are integrated by using a Soft Voting (Soft Voting) method on the basis of the prediction probability, and the mathematical expression form is as follows:
PSoft Voting=(PRF+PLightGBM)/2
wherein, PSoft VotingPrediction probability, P, for a soft voting fusion modelRF,PLightGBMRespectively representing the prediction probabilities of the random forest and the LightGBM model, Result representing the prediction Result of the fusion model,1 represents belonging to a subscriber of the type of repurchase, and 0 represents belonging to a subscriber of the type of non-repurchase. According to the test, the threshold value threshold of the invention is set to be 0.5, the prediction label is 1 when the threshold value threshold is larger than 0.5, and the prediction label is 0 when the threshold value threshold is smaller than 0.5, so that a prediction matrix is obtained
Therefore, the forecasting of the repeated purchasing behavior of the customer can be realized.
And pushing product information to the terminal equipment of the user and/or sending a re-purchasing behavior prediction result to a management system according to the final result.
The performance of the invention is measured as follows: the current algorithm uses the values of accuracy rate P, recall rate R and F1 as evaluation indexes, and performs the index calculation through the implementation of the data preprocessing method in the invention, and calculates the evaluation indexes by using the obtained label matrix, wherein the calculation formula is as follows:
the invention has good performance in the multi-channel marketing process of enterprises under a non-contract scene, and by taking the super-commercial power marketing as an example, after the system is applied, the conversion rate of the power marketing can be greatly improved, and more transactions are promoted to be generated. For enterprises, the effects of improving marketing guidance, increasing sales success rate, increasing the amount of finished products and transaction amount, reducing personnel cost and the like can be achieved. The performance on the data set, in particular: (1) on a training set generated by SMOTE-ENN, the model prediction accuracy rate is 98.73%, the recall rate is 99.09%, and the F1 value is 0.9874; (2) on a verification set consisting of real samples, the model prediction accuracy is 87.13%, the recall rate is 95.15%, and the F1 value is 0.8587; (3) the result is better than the prediction performance of the RF and LightGBM single model.
According to the invention, the user behavior characteristics are extracted from the display feedback of the historical purchase record of the customer by improving the classic RFM model to form a sample set, so that the problem that a large amount of implicit feedback is not available in a non-contract scene in the prior art is solved; according to the invention, the problem of data class imbalance of the data set in the prior art is effectively solved through the SMOTE-ENNN sample balancing method; the embodiment result shows that the method has good prediction performance and practical application value.
Claims (10)
1. A customer repurchase prediction method based on an RF-LightGBM fusion model under a non-contract scene is characterized by comprising the following steps:
acquiring historical purchase record data of a user, preprocessing the historical purchase record data and extracting features;
balancing the data subjected to the feature extraction by using a sample balancing method to obtain a balanced sample;
training sample data by using an optimization algorithm, and performing iterative optimization on the weak classifier in a specified weak classifier hyperparametric space;
performing ensemble learning to obtain a strong classifier by giving the same weight to each weak classifier;
predicting by using a strong classifier to obtain final results of product recommendation and repurchase behavior prediction;
and pushing product information to the terminal equipment of the user and/or sending a re-purchasing behavior prediction result to a management system according to the final result.
2. The method of claim 1, wherein the extracting features comprises:
time of last purchase, frequency of purchases, total amount of purchases, duration of relationship, purchase interval.
3. The method of claim 1, wherein the sample equalization method comprises:
generating a few samples of the extracted features by using a SMOTE oversampling method, judging the generated samples by using an ENN (edited KNN) method, and removing the samples if the prediction result is different from the actual class label to obtain balanced samples.
4. The method of claim 1, wherein the optimization algorithm comprises:
and optimizing the model hyper-parameters by using a TPE (Tree-structured park Estimator) Tree-shaped park estimation optimization algorithm, and training the model under the condition of the optimal hyper-parameters.
5. The customer repurchase prediction method based on the RF-LightGBM fusion model under the non-contract scene as claimed in claim 1, wherein the weak classifier comprises a random forest RF (random forms) model and a Light GBM model, the output results of the weak classifier are classification probability values, and the mathematical expression is as follows:
in the formula, NtreeIs the total number of decision trees, hiFor the ith decision tree, P (x | y) represents the probability that the prediction sample x belongs to the class y.
6. The method for predicting the customer repurchase based on the RF-LightGBM fusion model in the non-contract scenario as claimed in claim 1, wherein the ensemble learning specifically comprises:
the RF model and the Light GBM model are given the same weight, and are integrated by using a Soft Voting (Soft Voting) method on the basis of the prediction probability, and the mathematical expression form is as follows:
PSoft Voting=(PRF+PLightGBM)/2
wherein, PSoft VotingPrediction probability, P, for a soft voting fusion modelRF,PLightGBMThe prediction probabilities of the random forest and the LightGBM are respectively represented, Result represents the prediction Result of the fusion model, 1 represents that the user belongs to the repurchase type, 0 represents that the user belongs to the non-repurchase type, and threshold represents the classification threshold.
7. The method of claim 1, wherein the method for predicting the repurchase of the customer based on the RF-LightGBM fusion model under the non-contract scenario is based on a repurchase behavior prediction and a repurchase probability prediction as a product recommendation guide.
8. A client buyback prediction device based on an RF-LightGBM fusion model under a non-contract scene is characterized by comprising:
the acquisition module is used for acquiring historical purchase record data of a user, preprocessing the historical purchase record data and extracting features;
the balance module is used for balancing the data subjected to the feature extraction by using a sample balance method to obtain a balanced sample;
the optimization training module is used for training sample data by using an optimization algorithm and performing iterative optimization on the weak classifier in the specified hyper-parameter space of the weak classifier;
the ensemble learning module is used for performing ensemble learning to obtain a strong classifier by endowing the weak classifiers with the same weight;
the prediction module is used for predicting by using the strong classifier to obtain final results of product recommendation and repurchase behavior prediction;
and the pushing module is used for pushing the product information to the terminal equipment of the user according to the final result.
9. An electronic device, characterized in that:
comprises a processor and a memory;
the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions to perform the method for forecasting customer buys-back in a non-contract scenario based on the RF-LightGBM fusion model according to any of claims 1 to 7.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the program when executed by a processor implements a method for predicting a customer buyback based on an RF-LightGBM fusion model in a non-contract scenario as claimed in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110637643.7A CN113469730A (en) | 2021-06-08 | 2021-06-08 | Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110637643.7A CN113469730A (en) | 2021-06-08 | 2021-06-08 | Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113469730A true CN113469730A (en) | 2021-10-01 |
Family
ID=77869309
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110637643.7A Pending CN113469730A (en) | 2021-06-08 | 2021-06-08 | Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113469730A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114049155A (en) * | 2021-11-17 | 2022-02-15 | 浙江华坤道威数据科技有限公司 | Marketing operation method and system based on big data analysis |
CN114511330A (en) * | 2022-04-18 | 2022-05-17 | 山东省计算中心(国家超级计算济南中心) | Improved CNN-RF-based Ethernet workshop Pompe deception office detection method and system |
CN114549071A (en) * | 2022-02-18 | 2022-05-27 | 上海钧正网络科技有限公司 | Marketing strategy determination method and device, computer equipment and storage medium |
CN114863341A (en) * | 2022-05-17 | 2022-08-05 | 济南大学 | Online course learning supervision method and system |
CN115204537A (en) * | 2022-09-17 | 2022-10-18 | 华北理工大学 | Student score prediction method based on Bagging |
CN117114807A (en) * | 2023-08-24 | 2023-11-24 | 众合九通(北京)电子科技有限公司 | Commodity recommendation method and system based on user relationship |
CN117593044A (en) * | 2024-01-18 | 2024-02-23 | 青岛网信信息科技有限公司 | Dual-angle marketing campaign effect prediction method, medium and system |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107016569A (en) * | 2017-03-21 | 2017-08-04 | 聚好看科技股份有限公司 | The targeted customer's account acquisition methods and device of a kind of networking products |
CN107294993A (en) * | 2017-07-05 | 2017-10-24 | 重庆邮电大学 | A kind of WEB abnormal flow monitoring methods based on integrated study |
WO2018069817A1 (en) * | 2016-10-10 | 2018-04-19 | Tata Consultancy Services Limited | System and method for predicting repeat behavior of customers |
CN108171530A (en) * | 2017-12-06 | 2018-06-15 | 口碑(上海)信息技术有限公司 | It is a kind of to be used for visitor's unit price and the again method for improving and device of purchase rate |
CN108520469A (en) * | 2018-06-19 | 2018-09-11 | 南京新贝金服科技有限公司 | A kind of user based on electric business platform purchases behavior analysis method again |
CN108776922A (en) * | 2018-06-04 | 2018-11-09 | 北京至信普林科技有限公司 | Finance product based on big data recommends method and device |
CN110210913A (en) * | 2019-06-14 | 2019-09-06 | 重庆邮电大学 | A kind of businessman frequent customer's prediction technique based on big data |
CN110322085A (en) * | 2018-03-29 | 2019-10-11 | 北京九章云极科技有限公司 | A kind of customer churn prediction method and apparatus |
CN110599336A (en) * | 2018-06-13 | 2019-12-20 | 北京九章云极科技有限公司 | Financial product purchase prediction method and system |
CN110956497A (en) * | 2019-11-27 | 2020-04-03 | 桂林电子科技大学 | Method for predicting repeated purchasing behavior of user of electronic commerce platform |
CN111008871A (en) * | 2019-12-10 | 2020-04-14 | 重庆锐云科技有限公司 | Real estate repurchase customer follow-up quantity calculation method, device and storage medium |
CN111045716A (en) * | 2019-11-04 | 2020-04-21 | 中山大学 | Related patch recommendation method based on heterogeneous data |
CN111899055A (en) * | 2020-07-29 | 2020-11-06 | 亿达信息技术有限公司 | Machine learning and deep learning-based insurance client repurchase prediction method in big data financial scene |
-
2021
- 2021-06-08 CN CN202110637643.7A patent/CN113469730A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018069817A1 (en) * | 2016-10-10 | 2018-04-19 | Tata Consultancy Services Limited | System and method for predicting repeat behavior of customers |
CN107016569A (en) * | 2017-03-21 | 2017-08-04 | 聚好看科技股份有限公司 | The targeted customer's account acquisition methods and device of a kind of networking products |
CN107294993A (en) * | 2017-07-05 | 2017-10-24 | 重庆邮电大学 | A kind of WEB abnormal flow monitoring methods based on integrated study |
CN108171530A (en) * | 2017-12-06 | 2018-06-15 | 口碑(上海)信息技术有限公司 | It is a kind of to be used for visitor's unit price and the again method for improving and device of purchase rate |
CN110322085A (en) * | 2018-03-29 | 2019-10-11 | 北京九章云极科技有限公司 | A kind of customer churn prediction method and apparatus |
CN108776922A (en) * | 2018-06-04 | 2018-11-09 | 北京至信普林科技有限公司 | Finance product based on big data recommends method and device |
CN110599336A (en) * | 2018-06-13 | 2019-12-20 | 北京九章云极科技有限公司 | Financial product purchase prediction method and system |
CN108520469A (en) * | 2018-06-19 | 2018-09-11 | 南京新贝金服科技有限公司 | A kind of user based on electric business platform purchases behavior analysis method again |
CN110210913A (en) * | 2019-06-14 | 2019-09-06 | 重庆邮电大学 | A kind of businessman frequent customer's prediction technique based on big data |
CN111045716A (en) * | 2019-11-04 | 2020-04-21 | 中山大学 | Related patch recommendation method based on heterogeneous data |
CN110956497A (en) * | 2019-11-27 | 2020-04-03 | 桂林电子科技大学 | Method for predicting repeated purchasing behavior of user of electronic commerce platform |
CN111008871A (en) * | 2019-12-10 | 2020-04-14 | 重庆锐云科技有限公司 | Real estate repurchase customer follow-up quantity calculation method, device and storage medium |
CN111899055A (en) * | 2020-07-29 | 2020-11-06 | 亿达信息技术有限公司 | Machine learning and deep learning-based insurance client repurchase prediction method in big data financial scene |
Non-Patent Citations (9)
Title |
---|
JAMES BERGSTRA: "Algorithms for hyper-parameter optimization", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 31 December 2011 (2011-12-31), pages 1 * |
JAMES BERGSTRA: "Algorithms for hyper-parameter optimization", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, pages 1 * |
JUN WU: "User Value Identification Based on Improved RFM Model and -Means++ Algorithm for Complex Data Analysis", WIRELESS COMMUNICATIONS AND MOBILE COMPUTING * |
季晨雨;: "不平衡数据分类研究及在银行营销中的应用", 山西电子技术, no. 05 * |
张李义;李一然;文璇;: "新消费者重复购买意向预测研究", 数据分析与知识发现, no. 11 * |
张浩;陈龙;魏志强: "基于数据增强和模型更新的异常流量检测技术", 信息网络安全, no. 02, 10 February 2020 (2020-02-10), pages 66 * |
张浩等: "基于数据增强和模型更新的异常流量检测技术", 信息网络安全, pages 66 * |
杨霞霞;苏锋;黄戌霞;: "基于改进随机森林算法的不平衡数据分类方法研究", 网络安全技术与应用, no. 10 * |
陶新民等: "不均衡数据SVM分类算法及其应用", 31 October 2011, 黑龙江科学技术出版社, pages: 43 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114049155A (en) * | 2021-11-17 | 2022-02-15 | 浙江华坤道威数据科技有限公司 | Marketing operation method and system based on big data analysis |
CN114049155B (en) * | 2021-11-17 | 2022-08-19 | 浙江华坤道威数据科技有限公司 | Marketing operation method and system based on big data analysis |
CN114549071A (en) * | 2022-02-18 | 2022-05-27 | 上海钧正网络科技有限公司 | Marketing strategy determination method and device, computer equipment and storage medium |
CN114511330A (en) * | 2022-04-18 | 2022-05-17 | 山东省计算中心(国家超级计算济南中心) | Improved CNN-RF-based Ethernet workshop Pompe deception office detection method and system |
CN114863341A (en) * | 2022-05-17 | 2022-08-05 | 济南大学 | Online course learning supervision method and system |
CN114863341B (en) * | 2022-05-17 | 2024-05-31 | 济南大学 | Online course learning supervision method and system |
CN115204537A (en) * | 2022-09-17 | 2022-10-18 | 华北理工大学 | Student score prediction method based on Bagging |
CN117114807A (en) * | 2023-08-24 | 2023-11-24 | 众合九通(北京)电子科技有限公司 | Commodity recommendation method and system based on user relationship |
CN117593044A (en) * | 2024-01-18 | 2024-02-23 | 青岛网信信息科技有限公司 | Dual-angle marketing campaign effect prediction method, medium and system |
CN117593044B (en) * | 2024-01-18 | 2024-05-31 | 青岛网信信息科技有限公司 | Dual-angle marketing campaign effect prediction method, medium and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113469730A (en) | Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene | |
CN108648074B (en) | Loan assessment method, device and equipment based on support vector machine | |
CN111062757B (en) | Information recommendation method and system based on multipath optimizing matching | |
CN110503531B (en) | Dynamic social scene recommendation method based on time sequence perception | |
CN112418653A (en) | Number portability and network diver identification system and method based on machine learning algorithm | |
CN110826886A (en) | Electric power customer portrait construction method based on clustering algorithm and principal component analysis | |
CN109636482B (en) | Data processing method and system based on similarity model | |
CN107403345A (en) | Best-selling product Forecasting Methodology and system, storage medium and electric terminal | |
CN112785441B (en) | Data processing method, device, terminal equipment and storage medium | |
CN110147389A (en) | Account number treating method and apparatus, storage medium and electronic device | |
CN115204985A (en) | Shopping behavior prediction method, device, equipment and storage medium | |
CN114861050A (en) | Feature fusion recommendation method and system based on neural network | |
Chitra et al. | Customer retention in banking sector using predictive data mining technique | |
CN116187808A (en) | Electric power package recommendation method based on virtual power plant user-package label portrait | |
CN111861679A (en) | Commodity recommendation method based on artificial intelligence | |
CN113627997A (en) | Data processing method and device, electronic equipment and storage medium | |
CN116703250B (en) | Second-hand vehicle business supervision and prediction system | |
CN118037401A (en) | Knowledge graph-based agricultural product electronic commerce recommendation system | |
CN112150179A (en) | Information pushing method and device | |
CN111506813A (en) | Remote sensing information accurate recommendation method based on user portrait | |
CN116703533A (en) | Business management data optimized storage analysis method | |
CN113763032B (en) | Commodity purchase intention recognition method and device | |
EP3493082A1 (en) | A method of exploring databases of time-stamped data in order to discover dependencies between the data and predict future trends | |
CN115293867A (en) | Financial reimbursement user portrait optimization method, device, equipment and storage medium | |
US20230230143A1 (en) | Product recommendation system, product recommendation method, and recordingmedium storing product recommendation program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |