[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112685674B - Feature evaluation method and device for influencing user retention - Google Patents

Feature evaluation method and device for influencing user retention Download PDF

Info

Publication number
CN112685674B
CN112685674B CN202011613766.9A CN202011613766A CN112685674B CN 112685674 B CN112685674 B CN 112685674B CN 202011613766 A CN202011613766 A CN 202011613766A CN 112685674 B CN112685674 B CN 112685674B
Authority
CN
China
Prior art keywords
intervention
variable
account
retention
causal effect
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011613766.9A
Other languages
Chinese (zh)
Other versions
CN112685674A (en
Inventor
陈坤龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bigo Technology Pte Ltd
Original Assignee
Bigo Technology Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bigo Technology Pte Ltd filed Critical Bigo Technology Pte Ltd
Priority to CN202011613766.9A priority Critical patent/CN112685674B/en
Publication of CN112685674A publication Critical patent/CN112685674A/en
Application granted granted Critical
Publication of CN112685674B publication Critical patent/CN112685674B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a feature evaluation method and device for influencing user retention, wherein the method comprises the following steps: acquiring corresponding appointed characteristic values of each account in the current platform according to a plurality of appointed characteristic variables; acquiring retention information of each account, wherein the retention information is used for reflecting whether the current account is a retention account or not; according to the collected specified characteristic values, determining intervention variables; and determining the causal effect of the intervention variable on the user retention by adopting a causal inference algorithm in combination with the retention information. According to the embodiment, the influence of each characteristic variable on the user retention is judged through a causal inference method, so that key factors affecting the user retention can be rapidly analyzed, and after the key factors are obtained, a platform side can pertinently improve the factors through customized strategies, so that the user retention rate is improved.

Description

Feature evaluation method and device for influencing user retention
Technical Field
The embodiment of the application relates to a data processing technology, in particular to a feature evaluation method and device for influencing user retention.
Background
In the internet industry, a user starts to use an application within a certain period of time, and after a period of time, the application still continues to be used and is regarded as retention, and the proportion of the users occupying the newly added users at the time is the retention rate. The retained concept can be used to analyze the service effect of an application or website to determine whether the application or website is able to retain the user. Retention reflects what is actually a conversion rate, i.e., the process of converting an initially unstable user into an active user, a stable user, a loyalty user. In general, if the platform side does not take action to increase the retention, the retention of the new user will be at a relatively low level. Therefore, the platform side needs to take targeted measures to increase the retention rate of the user.
Disclosure of Invention
The application provides a feature evaluation method and a feature evaluation device for influencing user retention, which are used for evaluating key factors influencing user retention, so that targeted measures are taken to increase the retention rate of users.
In a first aspect, an embodiment of the present application provides a feature evaluation method for influencing user retention, where the method includes:
acquiring corresponding appointed characteristic values of each account in the current platform according to a plurality of appointed characteristic variables;
acquiring retention information of each account, wherein the retention information is used for reflecting whether the current account is a retention account or not;
according to the collected specified characteristic values, determining intervention variables;
and determining the causal effect of the intervention variable on the user retention by adopting a causal inference algorithm in combination with the retention information.
In a second aspect, an embodiment of the present application further provides a feature evaluation apparatus for influencing user retention, where the apparatus includes:
the specified characteristic value acquisition module is used for acquiring corresponding specified characteristic values of each account in the current platform according to a plurality of specified characteristic variables;
A retention information acquisition module for acquiring retention information of each account, the retention information is used for reflecting whether the current account is a retention account or not;
the intervention variable determining module is used for determining an intervention variable according to the collected specified characteristic value;
and the causal effect determining module is used for determining the causal effect of the intervention variable on the user retention by adopting a causal inference algorithm in combination with the retention information.
In a third aspect, an embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method described above when executing the program.
In a fourth aspect, embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the above-described method.
The application has the following beneficial effects:
In this embodiment, after the corresponding specified feature values of each account and the retention information of each account in the current platform are collected from the historical data, a causal inference algorithm may be adopted to determine the causal effect of a single feature on the user retention, and key factors affecting the user retention are found according to the causal effect of each feature on the user retention, and the platform side is assisted by these key factors to perform targeted operation, so that the retention rate of the user may be improved.
Drawings
FIG. 1 is a flowchart of an embodiment of a feature evaluation method for influencing user retention according to one embodiment of the present application;
FIG. 2 is a flow chart of an embodiment of a method for determining the causal effects of intervention variables on user retention provided in accordance with an embodiment of the present application;
FIG. 3 is a block diagram of an embodiment of a feature evaluation device for influencing user retention according to a second embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.
Detailed Description
The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present application are shown in the drawings.
Example 1
Fig. 1 is a flowchart of an embodiment of a feature evaluation method for influencing user retention, which is provided in an embodiment of the present application, and the embodiment may be applied to a server, and specifically may include the following steps:
Step 110, corresponding appointed characteristic values of all accounts in the current platform are collected according to a plurality of appointed characteristic variables.
In one embodiment, the specified feature variable may be a feature variable specified by the platform side according to priori knowledge and having a certain influence on the user, or may be a feature variable set according to actual service requirements, which is not limited in this embodiment.
For example, for a live platform, the specified feature variables may include, but are not limited to: language characteristics, country characteristics, registration characteristics, age characteristics, gender characteristics, device information, operator information, network information, etc. The language features are used for describing the use language of the user; the country feature is used to describe country (region) information on the user registration material; the registration feature is used for describing whether the user registers or not; age characteristics are used to describe the age of the user; gender characteristics are used to describe the gender information of the user; the device information is used for describing the terminal model (such as the mobile phone model) used by the user; the operator information is used to describe operator information of a network used by the user; the network information is used to describe the type of network that is often used by the user.
According to the appointed characteristic variable, the appointed characteristic value corresponding to each account in the current platform can be collected from the historical data. For example, feature values corresponding to language features may include chinese, english, french, german, arabic, etc.; the characteristic values corresponding to the national characteristics may include china, united states, uk, germany, russia, etc.; the feature values corresponding to the registration features may include registrations, guests, etc.; the characteristic values corresponding to the age characteristics may include adult, minor, etc.; the characteristic values corresponding to the gender characteristic may include male, female, etc.; the characteristic value corresponding to the device information may include a specific device model; the characteristic values corresponding to the operator information may include mobile, corporate, telecommunications, etc. specific operators; the characteristic values corresponding to the network information may include 2g-4g, WI-FI, etc.
In one embodiment, after a series of specified eigenvalues for each account in the current platform are collected, the specified eigenvalues may be combined into a eigenvalue matrix, e.g., the eigenvalue matrix may be represented by X, where the dimension of X may be dimX = (n, p), where n is the number of accounts in the current platform, and p is the number of specified eigenvalues for a single account, which may be represented by p 1,…,pn.
To facilitate subsequent fitting, in one example, the value of the specified feature variable (i.e., the specified feature value) may be expressed by a value of 0 or 1, for example, when data collection is performed, the specified feature value may be determined in the form of an option or list, for example, if the specified feature variable is a language feature, a plurality of language candidates may be included under the language feature, each candidate includes two options of "yes" and "no", and when "yes" is selected, the corresponding specified feature value is "1"; when "no" is selected, the corresponding specified feature value is "0".
And 120, acquiring retention information of each account, wherein the retention information is used for reflecting whether the current account is a retention account.
In practice, the platform side may define a user retention determination rule according to an actual service requirement or according to experience, so as to determine retention information of each account according to the retention determination rule, that is, determine whether each account is a retention account. For example, if the setting of the retention determination rule may be setting a time period, if the account has no login record in the time period after registration, the account is determined to be an attrition account, otherwise, if the account has a login record in the time period, the account is determined to be a retention account. For example, if a certain account is not logged in within 15 days after registration, it is determined that the account is not left.
In one example, the retention information may be represented by y, where the value of y may be a value of 0 or 1, if y i =1, indicating that the user i is finally retained; conversely, if y i =0, this indicates that user i is eventually losing. The dimension of y is dimy = (n, 1).
And 130, determining an intervention variable according to the acquired specified characteristic value.
The goal of this embodiment is to analyze the effect of any single variable on retention improvement from historical data, which is the intervention variable referred to in this embodiment, also known as the primary variable, i.e., the variable of interest (i.e., to be explored) that has an effect on the target utility (i.e., user retention).
The intervention variable may be represented by T, which may take on a value of 0 or 1, representing "intervention accepted" when t=1, and "not accepted" when t=0.
In one embodiment, step 130 may further comprise the steps of:
Determining a target specified feature variable from the plurality of specified feature variables; and determining a target specified characteristic value from one or more specified characteristic values corresponding to the target specified characteristic variable, and generating an intervention variable.
In this step, the intervention variable is not in an equivalent relationship with the specified feature variable, and the intervention variable may be generated based on a feature value of one of the specified feature variables. In this embodiment, the target specified feature variable is a specified feature variable of interest, and the target specified feature value is one of all specified feature values of the target specified feature variable.
For example, assuming the target specified feature variable p i is "language feature", the corresponding specified feature values include, but are not limited to: chinese, english, french, german, arabic, etc., different specified feature values according to the target specified feature variable may generate the following different intervention variables (in a form similar to one-hot encoding):
T 1 whether to speak Chinese. 1 represents speaking Chinese, 0 represents not speaking Chinese;
t 2 whether English is spoken. 1 represents English speaking, and 0 represents English speaking.
And 140, determining the causal effect of the intervention variable on the user retention by adopting a causal inference algorithm in combination with the retention information.
In this embodiment, the user is kept as the target utility variable U, and the causal effect of the currently concerned intervention variable T on the target utility variable U, i.e. the extent of influence of the change of the intervention variable T on the target utility variable U, is determined according to a causal inference algorithm.
In one embodiment, as shown in fig. 2, step 140 may further comprise the steps of:
step 140-1, according to the intervention variable, obtaining a first observed variable value corresponding to the account receiving the intervention, and obtaining a second observed variable value corresponding to the account not receiving the intervention.
In this step, after determining the intervention variable T, it may be determined from this intervention variable whether the respective account of the current platform belongs in the real case to the account that is subject to the intervention or to the account that is not subject to the intervention. The account that receives the intervention is the account that satisfies t=1, and the account that does not receive the intervention is the account that satisfies t=0. For example, if T is "whether to speak chinese (1 represents speaking chinese, 0 represents not speaking chinese)", the account that received the intervention is that of chinese (t=1), and the account that did not receive the intervention is that of not speaking chinese (t=0).
The observed variable value (including the first observed variable value and the second observed variable value) of each account may be understood as the variable value of each account corresponding to the target utility variable in a real case, in this embodiment, the variable value of this target utility variable "user-persisted", i.e. "whether persisted". For example, for T i =1 (i.e., the account that received the intervention), assuming that user i eventually runs off, its first observed variable value=0; as another example, for T j =0 (i.e., an account that does not accept intervention), assuming user j is ultimately left, its second observed variable value=1.
Step 140-2, determining a first predicted variable value for the account that did not receive the intervention, and determining a second predicted variable value for the account that did not receive the intervention.
In one embodiment, the causal inference algorithm of this embodiment is based on an algorithm under the Rubin framework, and the causal inference idea and "anti-facts inference" are closely related, and for an account that receives an intervention (t=1), the causal effect of the intervention on the target utility variable is related to the value of the target utility variable for the account that does not receive the intervention (t=0). Vice versa, for an account that does not receive an intervention (t=0), the causal effect of the intervention on the target utility variable is related to the value of the target utility variable of the account under the intervention (t=1), so as to exclude the interference of the variable. In practice, however, for a single account, only one of the truly occurring ones that are or are not subject to intervention is typically observed, and neither is the case. Therefore, in this step, another condition that cannot be observed can be predicted, and a corresponding predicted variable value can be obtained.
In particular, for an account that is subject to intervention, the actual condition that can be observed is the target utility variable value at the time it is subject to intervention, i.e., the counter-fact condition for which it is not subject to intervention is not observed, so that the target utility variable value (i.e., the first predicted variable value) for that account may be predicted without intervention. Accordingly, for an account that does not receive an intervention, the actual condition that can be observed is the target utility variable value for which it does not receive an intervention, i.e., the condition for which it receives an intervention is not observed, so the target utility variable value (i.e., the second predicted variable value) for which the account receives an intervention can be predicted.
In one embodiment, the machine model may be used to simulate the inverse facts inference to obtain the first predicted variable value and the second predicted variable value, and step 140-2 may further include the steps of:
Step 140-2-1, organizing all accounts with intervention into a set of experimental samples, and organizing all accounts without intervention into a set of control samples.
In this embodiment, an account with intervention (T i =1) may be used as the experimental sample, and an account without intervention (T i =0) may be used as the control sample (which may also be referred to as the control sample), and all experimental samples may constitute the experimental sample set, and all control samples may constitute the control sample set.
In one example, the set of experimental samples may include specified feature values and retention information corresponding to each of the accounts that received the intervention, in addition to the accounts that received the intervention. Similarly, the experimental sample set can include the specified characteristic value and the retention information corresponding to each account which does not accept the intervention, besides the account which does not accept the intervention.
Step 140-2-2, training a first machine model based on the set of experimental samples, and training a second machine model based on the set of control samples.
In this step, a set of experimental samples is used to train a first machine model, and a set of control samples is used to train a second machine model.
In one embodiment, the first machine model may be obtained as follows:
Selecting a target covariate from covariates according to the intervention variable, the covariates comprising: the specified characteristic variables are other specified characteristic variables except the specified characteristic variable corresponding to the intervention variable; and training a first machine model by adopting a preset machine learning algorithm based on variable values corresponding to target covariates of all accounts in the experiment sample set and the retention information of the accounts.
Specifically, the experimental sample set may include intervention variables, covariates and corresponding variable values. Covariates were introduced for causal interpretation of intervention variables, and after hope, non-confounding assumptions hold. Covariates refer to variables related to research in addition to intervention variables and target utility variables. In this embodiment, the covariates may include: and other specified characteristic variables except the specified characteristic variable corresponding to the intervention variable in the specified characteristic variables. It is also understood that the feature matrix X can be divided into intervention variables and covariates, i.eWherein, Is a covariate.
In practice, among the target specified feature variables corresponding to the current intervention variable, the variables generated by the specified feature values other than the target specified feature value also belong to the covariates. For example, if T 1 is speaking Chinese, 1 is speaking Chinese, 0 is not speaking Chinese; t 2 whether English is spoken, 1 stands for English, 0 stands for not English. If it is desired to determine the effect of T1 on the target utility variable y, then T 2 also belongs to the covariates.
After the covariates are determined, one covariate can be selected as a target covariate, for an experimental sample set, a variable value corresponding to the target covariate of each account in the set and retention information of the account can be used as sample data, a first machine model corresponding to the target covariate is trained by adopting a preset machine learning algorithm, and the training aim is to estimate a target variable value of the intervention variable under the condition of receiving the intervention by learning the association relation between the variable value corresponding to the target covariate of each account and the retention information of the account under the condition of giving the target covariate.
In other embodiments, the target covariates may be more than one, but a plurality of or all covariates, and specifically, the training target may be determined according to the actual service requirement, which is not limited in this embodiment.
In one example, the pre-set machine learning algorithm may include a supervised learning method, such as linear regression, random forest, support vector machine, XGBOOST, and the like.
The training method of the second machine model is similar to the training method of the first machine model, for a comparison sample set, variable values corresponding to target covariates of all accounts in the set and retention information of the accounts can be used as sample data, a preset machine learning algorithm is adopted to train the second machine model, and the training target is to estimate target variable values of intervention variables under the condition of not receiving intervention by learning association relations between the variable values corresponding to the target covariates of all the accounts and the retention information of the accounts under the condition of given target covariates.
In this embodiment, the first machine model is used to estimate the target variable value of the account with the intervention given the target covariates, in one example the first machine model may be defined as: μ 1 (X) =e [ U (1) |x=x ], where x=x represents the target covariate and U (1) represents the potential outcome of the target utility variable at t=1; e [ U (1) |x=x ] represents the predicted value of the target utility variable under the condition of x=x, t=1, i.e., when the current account is subjected to intervention given the target covariate X.
The second machine model is used to estimate target variable values for the account without accepting intervention given the target covariates, and in one example, the second machine model may be defined as: μ 0 (X) =e [ U (0) |x=x ], where x=x represents the target covariate and U (0) represents the potential outcome of the target utility variable at t=0; e [ U (0) |x=x ] represents the predicted value of the target utility variable for the condition x=x, t=0, i.e., when the current account is not receiving intervention given the target covariate X.
Step 140-2-3, predicting a first predicted variable value of each account in the experimental sample set without intervention using the second machine model.
In this step, the target variable value for each sample with intervention is observable for the set of experimental samples, while the target variable value for each sample with no intervention is missing, so the present embodiment may employ a second machine model to predict the first predicted variable value for each account in the set of experimental samples without intervention given the target covariates. Specifically, variable values of target covariates of the accounts in the experimental sample set can be input to the second machine model respectively, the second machine model performs prediction processing, and first prediction variable values corresponding to the accounts are output.
Step 140-2-4, predicting a second predicted variable value of each account in the control sample set under the condition of intervention by using the first machine model.
In this step, the target variable value for each sample without intervention is observable for the control sample set, while the target variable value for each sample with intervention is absent, and therefore, the present embodiment may employ a first machine model to predict a second predicted variable value for each account in the control sample set with intervention given the target covariates. Specifically, variable values of target covariates of the accounts in the comparison sample set can be input into the first machine model respectively, the first machine model performs prediction processing, and second prediction variable values corresponding to the accounts are output.
In one embodiment, after the second machine model predicts a first predicted variable value of each account in the experimental sample set under the condition of not receiving the intervention, and the first machine model predicts a second predicted variable value of each account in the control sample set under the condition of receiving the intervention, in order to facilitate the fitting of data, error processing may be performed on the first predicted variable value and the second predicted variable value, for example, corresponding error terms may be added to the first predicted variable value and the second predicted variable value respectively. The first predicted variable value for the account that received the intervention without the intervention may be expressed as: u (0) =μ 0 (X) +e (0), where e (0) is the error term; the second predicted variable value for an account that did not receive the intervention in the event of receiving the intervention may be expressed as: u (1) =μ 1 (X) +e (1), where e (1) is the error term.
Step 140-3, determining a first causal effect of the account subject to the intervention based on the first observed variable value and the first predicted variable value.
In this step, after obtaining the target variable values (i.e., the first observed variable value and the first predicted variable value) for each experimental sample in the set of experimental samples for both cases of intervention and no intervention, a difference in the target variable values for both cases may be calculated, which may be used as a causal effect of the intervention variable of the sample on the target utility variable given the target covariate. In this embodiment, a difference between the first observed variable value and the first predicted variable value of the current account is calculated as the first causal effect.
For example, the first causal effect of each experimental sample can be expressed as: wherein, A first observed variable value representing the user i in the event of receiving an intervention; a first predicted variable value for user i without receiving intervention given the target intervention variable X, predicted using the second machine model; representing a first causal effect of the current intervention variable on the target utility variable for user i.
In other embodiments, the first causal effect may also be expressed as a desire to calculate the difference of the first observed variable value and the first predicted variable value of the current account, e.g., E [ D 1 |x=x ].
Step 140-4, determining a second causal effect of the account not subject to the intervention based on the second observed variable value and the second predicted variable value.
In this step, after obtaining the target variable values (i.e., the second predicted variable value and the second observed variable value) for both cases of intervention and non-intervention for each control sample in the set of control samples, the difference in the target variable values for both cases can be calculated as a causal effect of the intervention variable of that sample on the target utility variable given the target covariate. In this embodiment, the difference between the second predicted variable value and the second observed variable value of the current account is calculated as the second causal effect.
For example, the second causal effect of each control sample can be expressed as: wherein, A second observed variable value representing user i without receiving intervention; a second predicted variable value for user i in the event of intervention given a target intervention variable X, predicted using the first machine model; Representing a second causal effect of the current intervention variable on the target utility variable for user i.
In other embodiments, the second causal effect may also be expressed as a desire to calculate the difference of the second observed variable value and the second predicted variable value of the current account, e.g., E [ D 0 |x=x ].
And 140-5, determining the causal effect of the intervention variable on the user retention according to the first causal effect or the second causal effect of each account.
After the first causal effect or the second causal effect of each account is determined, the causal effect of the intervention variable on the user retention can be determined in combination with the first causal effect or the second causal effect of each account.
In one embodiment, step 140-5 may further comprise the steps of:
Step 140-5-1, calculating an average of all first causal effects as a first average causal effect of the set of experimental samples.
In this step, after obtaining the first causal effect of each sample in the set of experimental samples (i.e. the account receiving the intervention), an average value of the first causal effect of each sample in the set of experimental samples can be calculated as the first average causal effect of the set of experimental samples given the target covariates, which can be understood as the average causal effect of t=1 on U given the target covariates.
Step 140-5-2, calculating an average of all second causal effects as a second average causal effect for the set of control samples.
In this step, after obtaining the second causal effect of each sample in the control sample set (i.e. the account that did not receive the intervention), then an average of the second causal effect of each sample in the control sample set can be calculated as the second average causal effect of the control sample set given the target covariates, which can be understood as the average causal effect of t=0 on U given the target covariates.
It should be noted that, in the steps 140-5-1 and 140-5-2, the average causal effect is determined by calculating the average value of the causal effects in the set, and in other embodiments, the average causal effect may be determined by calculating the average expectation of the causal effects in the set, which is not limited in this embodiment.
Step 140-5-3, determining a trend value reflecting the probability of user intervention given the target covariates.
In this embodiment, the trend value may also be referred to as the probability of the account accepting the intervention, which may be defined as e (X) =p (t= 1|X =x), i.e. in a real scene, when the target covariates x=x, the probability of the user accepting the intervention is predicted, and T serves the bernoulli distribution of the parameter e (X) given x=x, i.e. T to Bern (e (X)).
In one embodiment, the trend value may be determined as follows:
And inputting the variable value of the target covariate into a trained probability model, and outputting the predicted probability of the user receiving the intervention under the target covariate as a trend value by the probability model.
In this embodiment, a probabilistic model may be pre-trained whose training goal is to predict the probability of accepting an intervention given a target covariate. Then, the variable value of the current target covariate can be input into the probability model, and the probability value output by the probability model is obtained as the trend value.
And 140-5-4, weighting and calculating the first average causal effect and the second average causal effect by taking the tendency value as a weight to obtain the causal effect of the intervention variable on the user under the given target covariate.
In this step, after the trend value is obtained, the trend value may be used as a weight, and the first average cause-effect and the second average cause-effect may be weighted to obtain the cause-effect of the current intervention variable on the user retention under the given target covariate, where the cause-effect may be expressed by the following formula:
τ(x)=e(x)τ0(x)+(1-e(x))τ1(x)
Where e (x) is the trend value, τ 0 (x) is the second average causal effect, τ 1 (x) is the first average causal effect, x is the target covariate, τ (x) is the causal effect that the intervention variable has on the user given the target covariate.
In one embodiment, if multiple target covariates are specified, i.e., if there are multiple target covariates, step 140-5 may further comprise the steps of:
and calculating the average expectation of the causal effect corresponding to each target covariate, and obtaining the causal effect of the intervention variable on the user retention.
In this embodiment, if there are multiple target covariates, if the causal effect of the current intervention variable on the user retention is to be determined, the causal effect corresponding to the multiple target covariates may be calculated as an average expectation, and finally the causal effect of the intervention variable on the user retention is obtained. In one embodiment, when
When there are a plurality of intervention variables, the embodiment may further include the following steps:
and selecting the intervention variable with the largest causal effect as a key factor affecting the user retention according to the causal effect of each intervention variable on the user retention.
In this embodiment, when a plurality of intervention variables are specified, after the causal effect of each intervention variable on the user retention is obtained, the plurality of intervention variables may be ranked according to the causal effect, and the ranked results may be input into a subsequent matching algorithm to help calculate the total utility, or may provide a basis for artificial matching.
Furthermore, one or more intervention variables with the largest causal effect can be screened out according to the sequencing result to be used as key factors with the largest influence on the retention of the user, so that a targeted operation strategy can be formulated according to the screened intervention variables to improve the retention rate of the user.
In this embodiment, after the corresponding specified feature values of each account and the retention information of each account in the current platform are collected from the historical data, a causal inference algorithm may be adopted to determine the causal effect of a single feature on the user retention, and key factors affecting the user retention are found according to the causal effect of each feature on the user retention, and the platform side is assisted by these key factors to perform targeted operation, so that the retention rate of the user may be improved.
Example two
Fig. 3 is a block diagram of a feature evaluation device for influencing user retention according to a second embodiment of the present application, which may include the following modules:
the specified characteristic value collection module 310 is configured to collect corresponding specified characteristic values of each account in the current platform according to a plurality of specified characteristic variables;
the retention information obtaining module 320 is configured to obtain retention information of each account, where the retention information is used to reflect whether the current account is a retention account;
An intervention variable determining module 330, configured to determine an intervention variable according to the collected specified feature value;
the causal effect determination module 340 is configured to determine a causal effect of the intervention variable on user retention by adopting a causal inference algorithm in combination with the retention information.
In one embodiment, the intervention variable determination module 330 is specifically configured to:
determining a target specified feature variable from the plurality of specified feature variables;
And determining a target specified characteristic value from one or more specified characteristic values corresponding to the target specified characteristic variable, and generating an intervention variable.
In one embodiment, the cause and effect determination module 340 may further include the following sub-modules:
the observation variable value acquisition sub-module is used for acquiring a first observation variable value corresponding to an account which is subjected to intervention according to the intervention variable, and acquiring a second observation variable value corresponding to an account which is not subjected to intervention;
A prediction variable value determination submodule, configured to determine a first prediction variable value of the account that receives the intervention in a case of not receiving the intervention, and determine a second prediction variable value of the account that does not receive the intervention in a case of receiving the intervention;
A first causal effect determination submodule for determining a first causal effect of the account subject to the intervention based on the first observed variable value and the first predicted variable value;
A second causal effect determination submodule for determining a second causal effect of the account not subject to intervention based on the second observed variable value and the second predicted variable value;
And the user retention causal effect determination submodule is used for determining the causal effect of the intervention variable on the user retention according to the first causal effect or the second causal effect of each account.
In one embodiment, the prediction variable value determination submodule may further include the following units:
a sample set organization unit for organizing all accounts that are subject to intervention into an experimental sample set, and organizing all accounts that are not subject to intervention into a control sample set;
A model training unit for training a first machine model according to the set of experimental samples and a second machine model according to the set of control samples;
The model prediction unit is used for predicting a first prediction variable value of each account in the experimental sample set under the condition of not receiving intervention by adopting the second machine model; and predicting a second predicted variable value of each account in the control sample set under the condition of intervention by adopting the first machine model.
In one embodiment, the model training unit is specifically configured to:
Selecting a target covariate from covariates according to the intervention variable, the covariates comprising: the specified characteristic variables are other specified characteristic variables except the specified characteristic variable corresponding to the intervention variable;
And training a first machine model by adopting a preset machine learning algorithm based on variable values corresponding to target covariates of all accounts in the experiment sample set and the retention information of the accounts.
In one embodiment, the first causal effect determination submodule is specifically configured to:
and calculating a difference value between the first observed variable value and the first predicted variable value as the first causal effect.
In one embodiment, the user retention cause and effect determination submodule may further include the following elements:
the first average causal effect calculation unit is used for calculating the average value of all the first causal effects and taking the average value as the first average causal effect of the experimental sample set;
a second average causal effect calculation unit, configured to calculate an average value of all second causal effects as a second average causal effect of the control sample set;
A trend value determining unit configured to determine a trend value for reflecting a probability of user's intervention given a target covariate;
And the weighted calculation unit is used for carrying out weighted calculation on the first average causal effect and the second average causal effect by taking the tendency value as a weight to obtain the causal effect of the intervention variable on the user under the given target covariate.
In one embodiment, the tendency value determining unit is specifically configured to:
And inputting the variable value of the target covariate into a trained probability model, and outputting the predicted probability of the user receiving the intervention under the target covariate as a trend value by the probability model.
In one embodiment, if there are a plurality of the target covariates, the user hold cause and effect determination submodule is further configured to:
and calculating the average expectation of the causal effect corresponding to each target covariate, and obtaining the causal effect of the intervention variable on the user retention.
In one embodiment, when there are a plurality of intervention variables, the apparatus may further include the following modules:
And the key factor determining module is used for selecting the intervention variable with the largest causal effect as the key factor affecting the user retention according to the causal effect of each intervention variable on the user retention.
It should be noted that, the feature evaluation device for influencing the user retention provided by the embodiment of the present application may execute the method provided by the first embodiment of the present application, and has the corresponding functional modules and beneficial effects of the execution method.
Example III
Fig. 4 is a schematic structural diagram of an electronic device according to a third embodiment of the present application, as shown in fig. 4, the electronic device includes a processor 410, a memory 420, an input device 430 and an output device 440; the number of processors 410 in the electronic device may be one or more, one processor 410 being taken as an example in fig. 4; the processor 410, memory 420, input device 430, and output device 440 in the electronic device may be connected by a bus or other means, for example in fig. 4.
The memory 420 is a computer readable storage medium that can be used to store software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present application. The processor 410 executes various functional applications of the electronic device and data processing, i.e., implements the methods described above, by running software programs, instructions, and modules stored in the memory 420.
The memory 420 may mainly include a program storage area and a data storage area, wherein the program storage area
An operating system, at least one application program required for functionality may be stored; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 420 may further include memory remotely located relative to processor 410, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 430 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. The output 440 may include a display device such as a display screen.
Example IV
The fourth embodiment of the present application also provides a storage medium containing computer-executable instructions for performing the method of any of the first embodiments when executed by a processor of a server.
From the above description of embodiments, it will be clear to a person skilled in the art that the present application may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk, or an optical disk of a computer, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present application.
It should be noted that, in the embodiment of the apparatus, each unit and module included are only divided according to the functional logic, but not limited to the above-mentioned division, so long as the corresponding function can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present application.
Note that the above is only a preferred embodiment of the present application and the technical principle applied. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, while the application has been described in connection with the above embodiments, the application is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the application, which is set forth in the following claims.

Claims (12)

1. A method of feature assessment affecting user retention, the method comprising:
acquiring corresponding appointed characteristic values of each account in the current platform according to a plurality of appointed characteristic variables;
acquiring retention information of each account, wherein the retention information is used for reflecting whether the current account is a retention account or not;
according to the collected specified characteristic values, determining intervention variables;
Adopting a causal inference algorithm in combination with the retention information to determine the causal effect of the intervention variable on the retention of the user;
And determining the causal effect of the intervention variable on the user retention by adopting a causal inference algorithm in combination with the retention information, wherein the causal effect comprises the following steps:
According to the intervention variable, acquiring a first observation variable value corresponding to the account which receives the intervention, and acquiring a second observation variable value corresponding to the account which does not receive the intervention;
Determining a first predicted variable value of the account subject to the intervention in case of not subject to the intervention, and determining a second predicted variable value of the account subject to the intervention in case of not subject to the intervention;
determining a first causal effect of the account subject to the intervention based on the first observed variable value and the first predicted variable value;
Determining a second causal effect for the account not subject to intervention based on the second observed variable value and the second predicted variable value;
And determining the causal effect of the intervention variable on the retention of the user according to the first causal effect or the second causal effect of each account.
2. The method of claim 1, wherein said determining an intervention variable from the collected specified characteristic values comprises:
determining a target specified feature variable from the plurality of specified feature variables;
And determining a target specified characteristic value from one or more specified characteristic values corresponding to the target specified characteristic variable, and generating an intervention variable.
3. The method of claim 1, wherein the determining a first predicted variable value for the account subject to the intervention without the intervention and determining a second predicted variable value for the account subject to the intervention without the intervention comprises:
Organizing all accounts that received the intervention into a set of experimental samples, and organizing all accounts that did not receive the intervention into a set of control samples;
training a first machine model from the set of experimental samples and training a second machine model from the set of control samples;
predicting a first prediction variable value of each account in the experimental sample set under the condition of not receiving intervention by adopting the second machine model;
And predicting a second prediction variable value of each account in the control sample set under the condition of intervention by adopting the first machine model.
4. A method according to claim 3, wherein said training a first machine model from said set of experimental samples comprises:
Selecting a target covariate from covariates according to the intervention variable, the covariates comprising: the specified characteristic variables are other specified characteristic variables except the specified characteristic variable corresponding to the intervention variable;
And training a first machine model by adopting a preset machine learning algorithm based on variable values corresponding to target covariates of all accounts in the experiment sample set and the retention information of the accounts.
5. The method of claim 4, wherein the determining a first causal effect of the account subject to intervention based on the first observed variable value and the first predicted variable value comprises:
and calculating a difference value between the first observed variable value and the first predicted variable value as the first causal effect.
6. The method of claim 4 or 5, wherein determining the causal effect of the intervention variable on user retention based on the first causal effect or the second causal effect of each account comprises:
Calculating the average value of all first causal effects as the first average causal effect of the experimental sample set;
calculating the average value of all second causal effects as the second average causal effect of the control sample set;
Determining a trend value, wherein the trend value is used for reflecting the probability of the user to accept the intervention given the target covariates;
And taking the tendency value as a weight, and carrying out weighted calculation on the first average causal effect and the second average causal effect to obtain the causal effect of the intervention variable on the user retention under the given target covariate.
7. The method of claim 6, wherein the determining the trend value comprises:
And inputting the variable value of the target covariate into a trained probability model, and outputting the predicted probability of the user receiving the intervention under the target covariate as a trend value by the probability model.
8. The method of claim 6, wherein if there are a plurality of the target covariates, the determining the causal effect of the intervention variable on user retention based on the first causal effect or the second causal effect of each account, further comprises:
and calculating the average expectation of the causal effect corresponding to each target covariate, and obtaining the causal effect of the intervention variable on the user retention.
9. The method of claim 1 or 2, wherein when there are a plurality of the intervention variables, after the causal effect of the intervention variables on user persistence is determined using a causal inference algorithm in conjunction with the persistence information, the method further comprises:
and selecting the intervention variable with the largest causal effect as a key factor affecting the user retention according to the causal effect of each intervention variable on the user retention.
10. A feature evaluation device for influencing user retention, the device comprising:
the specified characteristic value acquisition module is used for acquiring corresponding specified characteristic values of each account in the current platform according to a plurality of specified characteristic variables;
A retention information acquisition module for acquiring retention information of each account, the retention information is used for reflecting whether the current account is a retention account or not;
the intervention variable determining module is used for determining an intervention variable according to the collected specified characteristic value;
the causal effect determining module is used for determining causal effects of the intervention variables on user retention by adopting a causal inference algorithm in combination with the retention information;
The causal effect determination module comprises the following sub-modules:
the observation variable value acquisition sub-module is used for acquiring a first observation variable value corresponding to an account which is subjected to intervention according to the intervention variable, and acquiring a second observation variable value corresponding to an account which is not subjected to intervention;
A prediction variable value determination submodule, configured to determine a first prediction variable value of the account that receives the intervention in a case of not receiving the intervention, and determine a second prediction variable value of the account that does not receive the intervention in a case of receiving the intervention;
A first causal effect determination submodule for determining a first causal effect of the account subject to the intervention based on the first observed variable value and the first predicted variable value;
A second causal effect determination submodule for determining a second causal effect of the account not subject to intervention based on the second observed variable value and the second predicted variable value;
And the user retention causal effect determination submodule is used for determining the causal effect of the intervention variable on the user retention according to the first causal effect or the second causal effect of each account.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-9 when the program is executed by the processor.
12. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-9.
CN202011613766.9A 2020-12-30 2020-12-30 Feature evaluation method and device for influencing user retention Active CN112685674B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011613766.9A CN112685674B (en) 2020-12-30 2020-12-30 Feature evaluation method and device for influencing user retention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011613766.9A CN112685674B (en) 2020-12-30 2020-12-30 Feature evaluation method and device for influencing user retention

Publications (2)

Publication Number Publication Date
CN112685674A CN112685674A (en) 2021-04-20
CN112685674B true CN112685674B (en) 2024-09-24

Family

ID=75455367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011613766.9A Active CN112685674B (en) 2020-12-30 2020-12-30 Feature evaluation method and device for influencing user retention

Country Status (1)

Country Link
CN (1) CN112685674B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112311B (en) * 2021-05-12 2023-07-25 北京百度网讯科技有限公司 Method for training causal inference model and information prompting method and device
CN114186096A (en) * 2021-12-10 2022-03-15 北京达佳互联信息技术有限公司 Information processing method and device
CN115907074A (en) * 2022-09-26 2023-04-04 清华大学 Prediction method and device for platform user loss, electronic equipment and medium
CN115794175B (en) * 2023-02-06 2023-05-02 广东省科学技术情报研究所 Technology research and development evaluation system and method based on big data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108364195A (en) * 2018-02-09 2018-08-03 腾讯科技(深圳)有限公司 User retains probability forecasting method, device, predictive server and storage medium
CN112070533A (en) * 2020-08-28 2020-12-11 上海连尚网络科技有限公司 Method and equipment for predicting user retention

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100333134A1 (en) * 2009-06-30 2010-12-30 Mudd Advertising System, method and computer program product for advertising
US8909567B2 (en) * 2012-02-20 2014-12-09 Xerox Corporation Method and system for the dynamic allocation of resources based on fairness, throughput, and user behavior measurement
CN105005909A (en) * 2015-06-17 2015-10-28 深圳市腾讯计算机系统有限公司 Method and device for predicting lost users
CN107230091B (en) * 2016-03-23 2020-10-30 滴滴(中国)科技有限公司 Car pooling request order matching method and device
CN107292150B (en) * 2016-04-13 2020-03-06 平安科技(深圳)有限公司 User identity confirmation method and device in security information processing
CN106294658B (en) * 2016-08-04 2020-09-04 腾讯科技(深圳)有限公司 Webpage quick display method and device
CN106529727B (en) * 2016-11-18 2020-09-25 腾讯科技(深圳)有限公司 User loss prediction model generation method and related device
CN106959916B (en) * 2017-02-28 2022-11-04 腾讯科技(深圳)有限公司 Client retention impact detection method and device
CN107437095A (en) * 2017-07-24 2017-12-05 腾讯科技(深圳)有限公司 Classification determines method and device
CN108170909B (en) * 2017-12-13 2021-08-03 中国平安财产保险股份有限公司 Intelligent modeling model output method, equipment and storage medium
CN108053322A (en) * 2017-12-15 2018-05-18 东峡大通(北京)管理咨询有限公司 The customer investment return evaluation method and system of vehicle
CN108415981B (en) * 2018-02-09 2020-10-09 平安科技(深圳)有限公司 Data dimension generation method, device, equipment and computer readable storage medium
CN108537587A (en) * 2018-04-03 2018-09-14 广州优视网络科技有限公司 It is lost in user's method for early warning, device, computer readable storage medium and server
CN108648000B (en) * 2018-04-24 2022-10-28 腾讯科技(深圳)有限公司 Method and device for evaluating user retention life cycle and electronic equipment
CN109492891B (en) * 2018-10-26 2022-04-29 创新先进技术有限公司 User loss prediction method and device
CN111260382A (en) * 2018-11-30 2020-06-09 北京嘀嘀无限科技发展有限公司 Prediction processing method and device for loss probability
CN109858970B (en) * 2019-02-02 2021-07-02 中国银行股份有限公司 User behavior prediction method, device and storage medium
CN110585726B (en) * 2019-09-16 2023-04-07 腾讯科技(深圳)有限公司 User recall method, device, server and computer readable storage medium
CN111311030B (en) * 2020-03-27 2022-09-06 中国工商银行股份有限公司 User credit risk prediction method and device based on influence factor detection
CN111723973B (en) * 2020-05-15 2022-12-09 西安交通大学 Learning effect optimization method based on user behavior causal relationship in MOOC log data
CN111988634B (en) * 2020-08-14 2022-07-05 广州市百果园信息技术有限公司 Anchor selection method and device, computer readable storage medium and electronic equipment
CN112035549B (en) * 2020-08-31 2023-12-08 中国平安人寿保险股份有限公司 Data mining method, device, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108364195A (en) * 2018-02-09 2018-08-03 腾讯科技(深圳)有限公司 User retains probability forecasting method, device, predictive server and storage medium
CN112070533A (en) * 2020-08-28 2020-12-11 上海连尚网络科技有限公司 Method and equipment for predicting user retention

Also Published As

Publication number Publication date
CN112685674A (en) 2021-04-20

Similar Documents

Publication Publication Date Title
CN112685674B (en) Feature evaluation method and device for influencing user retention
US10360517B2 (en) Distributed hyperparameter tuning system for machine learning
WO2019172868A1 (en) Systems and method for automatically configuring machine learning models
CN111667010A (en) Sample evaluation method, device and equipment based on artificial intelligence and storage medium
CN110046706A (en) Model generating method, device and server
CN111881023B (en) Software aging prediction method and device based on multi-model comparison
CN111160959B (en) User click conversion prediction method and device
CN113360622A (en) User dialogue information processing method and device and computer equipment
CN116910274B (en) Test question generation method and system based on knowledge graph and prediction model
CN116338502A (en) Fuel cell life prediction method based on random noise enhancement and cyclic neural network
CN108961460B (en) Fault prediction method and device based on sparse ESGP (Enterprise service gateway) and multi-objective optimization
CN116186221A (en) Big data analysis method and system applied to online dialogue platform
CN110602207A (en) Method, device, server and storage medium for predicting push information based on off-network
US11468348B1 (en) Causal analysis system
CN110910241A (en) Cash flow evaluation method, apparatus, server device and storage medium
CN110704614A (en) Information processing method and device for predicting user group type in application
CN111078854A (en) Question-answer prediction model training method and device and question-answer prediction method and device
CN114202110B (en) Service fault prediction method and device based on RF-XGBOOST
CN110852854B (en) Method for generating quantitative gain model and method for evaluating risk control strategy
US11900222B1 (en) Efficient machine learning model architecture selection
CN114358394A (en) Feature index screening method, satisfaction degree prediction model construction method and prediction method
CN113868534A (en) Service processing method, service processing device, electronic device and storage medium
CN112732519A (en) Event monitoring method and device
CN116208513B (en) Gateway health degree prediction method and device
CN114330866B (en) Data processing method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant