CN118379086A - Data prediction method, apparatus, computer device, readable storage medium, and program product - Google Patents
Data prediction method, apparatus, computer device, readable storage medium, and program product Download PDFInfo
- Publication number
- CN118379086A CN118379086A CN202410807055.7A CN202410807055A CN118379086A CN 118379086 A CN118379086 A CN 118379086A CN 202410807055 A CN202410807055 A CN 202410807055A CN 118379086 A CN118379086 A CN 118379086A
- Authority
- CN
- China
- Prior art keywords
- sample set
- data
- resource conversion
- user
- experimental
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000006243 chemical reaction Methods 0.000 claims abstract description 113
- 230000035515 penetration Effects 0.000 claims abstract description 104
- 230000035945 sensitivity Effects 0.000 claims abstract description 55
- 238000004590 computer program Methods 0.000 claims abstract description 34
- 238000004458 analytical method Methods 0.000 claims abstract description 23
- 238000010206 sensitivity analysis Methods 0.000 claims abstract description 16
- 239000000523 sample Substances 0.000 claims description 249
- 239000013068 control sample Substances 0.000 claims description 86
- 238000013528 artificial neural network Methods 0.000 claims description 40
- 238000012545 processing Methods 0.000 claims description 17
- 238000012216 screening Methods 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 15
- 238000011157 data evaluation Methods 0.000 claims description 14
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 230000035699 permeability Effects 0.000 claims description 3
- 230000000452 restraining effect Effects 0.000 claims 1
- 238000013517 stratification Methods 0.000 description 34
- 230000001351 cycling effect Effects 0.000 description 12
- 238000011282 treatment Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000009826 distribution Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 230000008901 benefit Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 6
- 230000003467 diminishing effect Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 230000035790 physiological processes and functions Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 2
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 230000002354 daily effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 230000036772 blood pressure Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000011269 treatment regimen Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/27—Regression, e.g. linear or logistic regression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
- G06Q50/43—Business processes related to the sharing of vehicles, e.g. car sharing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Mathematical Physics (AREA)
- Primary Health Care (AREA)
- Computing Systems (AREA)
- Tourism & Hospitality (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Game Theory and Decision Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本申请涉及一种数据预测方法、装置、计算机设备、计算机可读存储介质和计算机程序产品。所述方法包括:获取各分层类型的用户的用户特征数据和资源转换数值;根据数据预测模型对各分层类型的用户的用户特征数据进行资源转换敏感度分析和弹性数据渗透指标分析,得到敏感度参数和弹性指标参数;基于数据预测模型对敏感度参数、弹性指标参数以及各资源转换数值进行数据渗透指标结果预测,得到各分层类型的用户在各资源转换数值下的数据渗透指标结果。采用本方法能够提高数据预测的准确性。
The present application relates to a data prediction method, device, computer equipment, computer-readable storage medium and computer program product. The method comprises: obtaining user feature data and resource conversion values of users of each stratified type; performing resource conversion sensitivity analysis and elastic data penetration index analysis on the user feature data of users of each stratified type according to a data prediction model to obtain sensitivity parameters and elastic index parameters; performing data penetration index result prediction on the sensitivity parameters, elastic index parameters and each resource conversion value based on the data prediction model to obtain the data penetration index results of users of each stratified type under each resource conversion value. The use of this method can improve the accuracy of data prediction.
Description
技术领域Technical Field
本申请涉及人工智能技术领域,特别是涉及一种数据预测方法、装置、计算机设备、计算机可读存储介质和计算机程序产品。The present application relates to the field of artificial intelligence technology, and in particular to a data prediction method, apparatus, computer equipment, computer-readable storage medium, and computer program product.
背景技术Background technique
随着人工智能技术的发展,在共享电单车行业中,对于用户的消费行为渗透率的预测越来越普遍,渗透率可以帮助运营平台分析其市场渗透程度,进而评估市场潜力。With the development of artificial intelligence technology, the prediction of user consumption behavior penetration rate is becoming more and more common in the shared electric motorcycle industry. The penetration rate can help operating platforms analyze their market penetration level and then evaluate market potential.
传统技术中,对于市场分析使用的是Logit回归模型(Logit model,评定模型),在与共享电单车行业用户消费行为相关的数据中,选择影响用户消费行为渗透率的变量作为自变量,将选定的变量导入Logit回归模型中进行建模,进而基于Logit回归模型对单一分层类型的用户的用户数据中,影响用户消费行为渗透率的变量进行分析和预测,得到当前分层类型的用户的渗透率。In traditional technology, the Logit regression model (Logit model, assessment model) is used for market analysis. In the data related to the user consumption behavior of the shared electric motorcycle industry, the variables that affect the penetration rate of user consumption behavior are selected as independent variables, and the selected variables are imported into the Logit regression model for modeling. Then, based on the Logit regression model, the variables that affect the penetration rate of user consumption behavior in the user data of users of a single stratification type are analyzed and predicted to obtain the penetration rate of users of the current stratification type.
然而,传统技术中,由于Logit回归模型仅能对每个用户类型进行独立学习,导致数据稀疏的问题,进而导致对用户渗透率预测的准确性较差。However, in traditional technologies, since the Logit regression model can only learn each user type independently, it leads to data sparsity problems, which in turn leads to poor accuracy in user penetration prediction.
发明内容Summary of the invention
基于此,有必要针对上述技术问题,提供一种数据预测方法、装置、计算机设备、计算机可读存储介质和计算机程序产品。Based on this, it is necessary to provide a data prediction method, apparatus, computer device, computer-readable storage medium and computer program product to address the above technical problems.
第一方面,本申请提供了一种数据预测方法,包括:In a first aspect, the present application provides a data prediction method, comprising:
获取各分层类型的用户的用户特征数据和资源转换数值;Obtain user feature data and resource conversion values of users of each stratification type;
根据数据预测模型对所述各分层类型的用户的所述用户特征数据进行资源转换敏感度分析和弹性数据渗透指标分析,得到敏感度参数和弹性指标参数;Perform resource conversion sensitivity analysis and elastic data penetration index analysis on the user feature data of the users of each stratified type according to the data prediction model to obtain sensitivity parameters and elastic index parameters;
基于所述数据预测模型对所述敏感度参数、所述弹性指标参数以及各所述资源转换数值进行数据渗透指标结果预测,得到各所述分层类型的用户在各所述资源转换数值下的数据渗透指标结果。Based on the data prediction model, the sensitivity parameter, the elasticity index parameter and each resource conversion value are predicted for the data penetration index result, and the data penetration index result of each stratified type of user under each resource conversion value is obtained.
在其中一个实施例中,所述基于所述数据预测模型对所述敏感度参数、所述弹性指标参数以及各所述资源转换数值进行数据渗透指标结果预测,得到各所述分层类型的用户在各所述资源转换数值下的数据渗透指标结果,包括:In one embodiment, the data penetration index result prediction is performed on the sensitivity parameter, the elasticity index parameter and each resource conversion value based on the data prediction model to obtain the data penetration index result of each stratified type of user under each resource conversion value, including:
根据所述数据预测模型中的数据评定结构对所述弹性指标参数进行数据处理,得到初始数据渗透指标结果;Performing data processing on the elasticity index parameter according to the data evaluation structure in the data prediction model to obtain an initial data penetration index result;
基于所述敏感度参数、所述资源转换数值和所述数据评定结构对所述初始数据渗透指标结果进行修正和约束,得到每个所述分层类型的用户在各所述资源转换数值下的数据渗透指标结果。The initial data penetration index result is modified and constrained based on the sensitivity parameter, the resource conversion value and the data assessment structure to obtain the data penetration index result of each user of the stratified type under each resource conversion value.
在其中一个实施例中,所述获取各分层类型的用户的用户特征数据和资源转换数值之前,所述方法还包括:In one embodiment, before obtaining the user characteristic data and resource conversion value of each hierarchical type of user, the method further includes:
获取待训练的数据预测模型的神经网络结构和初始样本集;Obtain the neural network structure and initial sample set of the data prediction model to be trained;
根据倾向分匹配规则对所述初始样本集进行实验样本集和对照样本集的筛选,并基于所述实验样本集和所述对照样本集构建目标样本集;Screening the initial sample set for an experimental sample set and a control sample set according to a propensity score matching rule, and constructing a target sample set based on the experimental sample set and the control sample set;
根据所述目标样本集对所述神经网络结构进行训练,在神经网络结构满足预设迭代条件时,在神经网络结构的基础上结合数据评定结构,得到训练完成的数据预测模型。The neural network structure is trained according to the target sample set, and when the neural network structure meets the preset iteration conditions, a data evaluation structure is combined on the basis of the neural network structure to obtain a trained data prediction model.
在其中一个实施例中,所述根据倾向分匹配规则对所述初始样本集进行实验样本集和对照样本集的筛选,并基于所述实验样本集和所述对照样本集构建目标样本集,包括:In one embodiment, screening the initial sample set into an experimental sample set and a control sample set according to a propensity score matching rule, and constructing a target sample set based on the experimental sample set and the control sample set, includes:
根据倾向分预测模型对所述初始样本集进行预测处理,得到所述初始样本集中每个用户样本对应的倾向性得分;Performing prediction processing on the initial sample set according to the propensity score prediction model to obtain a propensity score corresponding to each user sample in the initial sample set;
将所述初始样本集中每一分层类型的用户在每一资源转换数值下的样本集作为实验样本集,基于所述倾向性得分,在所述分层类型的用户的其他资源转换数值的其他实验样本集中确定与所述实验样本集相匹配的对照样本集;Taking the sample set of each stratified type of users in the initial sample set at each resource conversion value as the experimental sample set, and determining a control sample set matching the experimental sample set from other experimental sample sets of other resource conversion values of users of the stratified type based on the propensity score;
基于各所述实验样本集和所述实验样本集对应的所述对照样本集,构建目标样本集。A target sample set is constructed based on each of the experimental sample sets and the control sample set corresponding to the experimental sample set.
在其中一个实施例中,所述将所述初始样本集中每一分层类型的用户在每一资源转换数值下的样本集作为实验样本集,基于所述倾向性得分,在所述分层类型的用户的其他资源转换数值的其他实验样本集中确定与所述实验样本集相匹配的对照样本集,包括:In one embodiment, the step of using the sample set of each stratified type of user in the initial sample set at each resource conversion value as the experimental sample set, and determining a control sample set matching the experimental sample set from other experimental sample sets of other resource conversion values of users of the stratified type based on the propensity score, comprises:
在各所述分层类型中,针对每一所述资源转换数值中的实验样本集,按照最近邻匹配算法检索与所述实验样本集中各所述用户样本的所述倾向性得分相匹配的目标倾向性得分;In each of the stratified types, for each experimental sample set in the resource conversion value, a target propensity score matching the propensity score of each of the user samples in the experimental sample set is retrieved according to a nearest neighbor matching algorithm;
基于所述目标倾向性得分确定与所述实验样本集相匹配的对照样本集。A control sample set matched with the experimental sample set is determined based on the target propensity score.
在其中一个实施例中,所述基于各所述实验样本集和所述实验样本集对应的所述对照样本集,构建目标样本集,包括:In one embodiment, constructing a target sample set based on each of the experimental sample sets and the control sample set corresponding to the experimental sample set includes:
计算各所述实验样本集和所述实验样本集对应的所述对照样本集之间的标准化均值差;Calculating the standardized mean difference between each of the experimental sample sets and the control sample set corresponding to the experimental sample set;
在所述标准化均值差满足预设阈值条件的情况下,将各所述实验样本集和所述实验样本集对应的所述对照样本集确定为目标样本集。When the standardized mean difference satisfies a preset threshold condition, each of the experimental sample sets and the control sample set corresponding to the experimental sample set are determined as a target sample set.
第二方面,本申请还提供了一种数据预测装置,包括:In a second aspect, the present application also provides a data prediction device, comprising:
第一获取模块,用于获取各分层类型的用户的用户特征数据和资源转换数值;A first acquisition module is used to acquire user characteristic data and resource conversion values of users of each hierarchical type;
分析模块,用于根据数据预测模型对所述各分层类型的用户的所述用户特征数据进行资源转换敏感度分析和弹性数据渗透指标分析,得到敏感度参数和弹性指标参数;An analysis module, configured to perform resource conversion sensitivity analysis and elastic data penetration index analysis on the user characteristic data of the users of each stratified type according to a data prediction model, and obtain sensitivity parameters and elastic index parameters;
预测模块,用于基于所述数据预测模型对所述敏感度参数、所述弹性指标参数以及各所述资源转换数值进行数据渗透指标结果预测,得到各所述分层类型的用户在各所述资源转换数值下的数据渗透指标结果。The prediction module is used to predict the data penetration index results of the sensitivity parameters, the elasticity index parameters and the resource conversion values based on the data prediction model, and obtain the data penetration index results of users of each stratified type under each resource conversion value.
在其中一个实施例中,所述预测模块具体用于根据所述数据预测模型中的数据评定结构对所述弹性指标参数进行数据处理,得到初始数据渗透指标结果;In one of the embodiments, the prediction module is specifically used to perform data processing on the elasticity index parameter according to the data evaluation structure in the data prediction model to obtain an initial data penetration index result;
基于所述敏感度参数、所述资源转换数值和所述数据评定结构对所述初始数据渗透指标结果进行修正和约束,得到每个所述分层类型的用户在各所述资源转换数值下的数据渗透指标结果。The initial data penetration index result is modified and constrained based on the sensitivity parameter, the resource conversion value and the data assessment structure to obtain the data penetration index result of each user of the stratified type under each resource conversion value.
在其中一个实施例中,所述装置还包括:In one embodiment, the device further comprises:
第二获取模块,用于获取待训练的数据预测模型的神经网络结构和初始样本集;The second acquisition module is used to obtain the neural network structure and initial sample set of the data prediction model to be trained;
筛选模块,用于根据倾向分匹配规则对所述初始样本集进行实验样本集和对照样本集的筛选,并基于所述实验样本集和所述对照样本集构建目标样本集;A screening module, used to screen the initial sample set into an experimental sample set and a control sample set according to a propensity score matching rule, and construct a target sample set based on the experimental sample set and the control sample set;
训练模块,用于根据所述目标样本集对所述神经网络结构进行训练,在神经网络结构满足预设迭代条件时,在神经网络结构的基础上结合数据评定结构,得到训练完成的数据预测模型。The training module is used to train the neural network structure according to the target sample set, and when the neural network structure meets the preset iteration conditions, a trained data prediction model is obtained by combining the data evaluation structure on the basis of the neural network structure.
在其中一个实施例中,所述筛选模块具体用于根据倾向分预测模型对所述初始样本集进行预测处理,得到所述初始样本集中每个用户样本对应的倾向性得分;In one embodiment, the screening module is specifically used to perform prediction processing on the initial sample set according to the propensity score prediction model to obtain the propensity score corresponding to each user sample in the initial sample set;
将所述初始样本集中每一分层类型的用户在每一资源转换数值下的样本集作为实验样本集,基于所述倾向性得分,在所述分层类型的用户的其他资源转换数值的其他实验样本集中确定与所述实验样本集相匹配的对照样本集;Taking the sample set of each stratified type of users in the initial sample set at each resource conversion value as the experimental sample set, and determining a control sample set matching the experimental sample set from other experimental sample sets of other resource conversion values of users of the stratified type based on the propensity score;
基于各所述实验样本集和所述实验样本集对应的所述对照样本集,构建目标样本集。A target sample set is constructed based on each of the experimental sample sets and the control sample set corresponding to the experimental sample set.
在其中一个实施例中,所述筛选模块具体用于在各所述分层类型中,针对每一所述资源转换数值中的实验样本集,按照最近邻匹配算法检索与所述实验样本集中各所述用户样本的所述倾向性得分相匹配的目标倾向性得分;In one embodiment, the screening module is specifically used to retrieve, for each experimental sample set in the resource conversion value in each of the stratified types, a target propensity score that matches the propensity score of each of the user samples in the experimental sample set according to a nearest neighbor matching algorithm;
基于所述目标倾向性得分确定与所述实验样本集相匹配的对照样本集。A control sample set matched with the experimental sample set is determined based on the target propensity score.
在其中一个实施例中,所述筛选模块具体用于计算各所述实验样本集和所述实验样本集对应的所述对照样本集之间的标准化均值差;In one embodiment, the screening module is specifically used to calculate the standardized mean difference between each of the experimental sample sets and the control sample set corresponding to the experimental sample set;
在所述标准化均值差满足预设阈值条件的情况下,将各所述实验样本集和所述实验样本集对应的所述对照样本集确定为目标样本集。When the standardized mean difference satisfies a preset threshold condition, each of the experimental sample sets and the control sample set corresponding to the experimental sample set are determined as a target sample set.
第三方面,本申请还提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现以下步骤:In a third aspect, the present application further provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the following steps are implemented:
获取各分层类型的用户的用户特征数据和资源转换数值;Obtain user feature data and resource conversion values of users of each stratification type;
根据数据预测模型对所述各分层类型的用户的所述用户特征数据进行资源转换敏感度分析和弹性数据渗透指标分析,得到敏感度参数和弹性指标参数;Perform resource conversion sensitivity analysis and elastic data penetration index analysis on the user feature data of the users of each stratified type according to the data prediction model to obtain sensitivity parameters and elastic index parameters;
基于所述数据预测模型对所述敏感度参数、所述弹性指标参数以及各所述资源转换数值进行数据渗透指标结果预测,得到各所述分层类型的用户在各所述资源转换数值下的数据渗透指标结果。Based on the data prediction model, the sensitivity parameter, the elasticity index parameter and each resource conversion value are predicted for the data penetration index result, and the data penetration index result of each stratified type of user under each resource conversion value is obtained.
第四方面,本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:In a fourth aspect, the present application further provides a computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the following steps are implemented:
获取各分层类型的用户的用户特征数据和资源转换数值;Obtain user feature data and resource conversion values of users of each stratification type;
根据数据预测模型对所述各分层类型的用户的所述用户特征数据进行资源转换敏感度分析和弹性数据渗透指标分析,得到敏感度参数和弹性指标参数;Perform resource conversion sensitivity analysis and elastic data penetration index analysis on the user feature data of the users of each stratified type according to the data prediction model to obtain sensitivity parameters and elastic index parameters;
基于所述数据预测模型对所述敏感度参数、所述弹性指标参数以及各所述资源转换数值进行数据渗透指标结果预测,得到各所述分层类型的用户在各所述资源转换数值下的数据渗透指标结果。Based on the data prediction model, the data penetration index result prediction is performed on the sensitivity parameter, the elasticity index parameter and each resource conversion value to obtain the data penetration index result of each stratified type of user under each resource conversion value.
第五方面,本申请还提供了一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现以下步骤:In a fifth aspect, the present application further provides a computer program product, including a computer program, which implements the following steps when executed by a processor:
获取各分层类型的用户的用户特征数据和资源转换数值;Obtain user feature data and resource conversion values of users of each stratification type;
根据数据预测模型对所述各分层类型的用户的所述用户特征数据进行资源转换敏感度分析和弹性数据渗透指标分析,得到敏感度参数和弹性指标参数;Perform resource conversion sensitivity analysis and elastic data penetration index analysis on the user feature data of the users of each stratified type according to the data prediction model to obtain sensitivity parameters and elastic index parameters;
基于所述数据预测模型对所述敏感度参数、所述弹性指标参数以及各所述资源转换数值进行数据渗透指标结果预测,得到各所述分层类型的用户在各所述资源转换数值下的数据渗透指标结果。Based on the data prediction model, the sensitivity parameter, the elasticity index parameter and each resource conversion value are predicted for the data penetration index result, and the data penetration index result of each stratified type of user under each resource conversion value is obtained.
上述数据预测方法、装置、计算机设备、计算机可读存储介质和计算机程序产品,获取各分层类型的用户的用户特征数据和资源转换数值;根据数据预测模型对所述各分层类型的用户的所述用户特征数据进行资源转换敏感度分析和弹性数据渗透指标分析,得到敏感度参数和弹性指标参数;基于所述数据预测模型对所述敏感度参数、所述弹性指标参数以及各所述资源转换数值进行数据渗透指标结果预测,得到各所述分层类型的用户在各所述资源转换数值下的数据渗透指标结果。采用本方法,根据数据预测模型对各分层类型的用户的用户特征数据进行资源转换敏感度分析和弹性数据渗透指标分析,基于预先学习到的各分层类型间用户特征的结合信息,分析计算敏感度参数和弹性指标参数,可以实现不同分层类型的用户特征数据间的信息共享,避免数据稀疏问题,并根据敏感度参数和弹性指标参数对数据渗透指标结果进行预测,保证数据渗透指标结果在合理范围,进而提高数据预测的准确性。The above-mentioned data prediction method, device, computer equipment, computer-readable storage medium and computer program product obtain user feature data and resource conversion values of users of each stratification type; perform resource conversion sensitivity analysis and elastic data penetration index analysis on the user feature data of the users of each stratification type according to the data prediction model to obtain sensitivity parameters and elastic index parameters; perform data penetration index result prediction on the sensitivity parameters, the elastic index parameters and the resource conversion values based on the data prediction model to obtain the data penetration index results of the users of each stratification type under the resource conversion values. By adopting this method, the user feature data of the users of each stratification type are subjected to resource conversion sensitivity analysis and elastic data penetration index analysis according to the data prediction model, and the sensitivity parameters and elastic index parameters are analyzed and calculated based on the pre-learned combined information of the user features between the stratification types, so as to realize information sharing between the user feature data of different stratification types, avoid the problem of data sparsity, and predict the data penetration index results according to the sensitivity parameters and elastic index parameters to ensure that the data penetration index results are within a reasonable range, thereby improving the accuracy of data prediction.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对本申请实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the related technologies, the drawings required for use in the embodiments of the present application or the related technical descriptions will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other related drawings can be obtained based on these drawings without paying creative work.
图1为一个实施例中数据预测方法的流程示意图;FIG1 is a schematic diagram of a flow chart of a data prediction method in one embodiment;
图2为一个实施例中基于弹性指标参数和敏感度参数确定数据渗透指标结果的流程示意图;FIG2 is a schematic diagram of a process for determining a data penetration index result based on an elasticity index parameter and a sensitivity parameter in one embodiment;
图3为一个实施例中数据预测模型的神经网络结构训练的流程示意图;FIG3 is a schematic diagram of a process for training a neural network structure of a data prediction model in one embodiment;
图4为一个实施例中构建目标样本集的流程示意图;FIG4 is a schematic diagram of a process of constructing a target sample set in one embodiment;
图5为一个实施例中确定实验样本集对应的对照样本集步骤的流程示意图;FIG5 is a schematic flow chart of a step of determining a control sample set corresponding to an experimental sample set in one embodiment;
图6为一个实施例中根据实验样本集和对照样本集构建目标样本集步骤的流程示意图;FIG6 is a schematic flow chart of the steps of constructing a target sample set according to an experimental sample set and a control sample set in one embodiment;
图7为一个实施例中数据预测装置的结构框图;FIG7 is a block diagram of a data prediction device in one embodiment;
图8为一个实施例中计算机设备的内部结构图。FIG. 8 is a diagram showing the internal structure of a computer device in one embodiment.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.
在一个实施例中,如图1所示,提供了一种数据预测方法,在数据分析领域中,数据预测方法依据数据预测模型,根据已存在客观数据的用户特征数据在数据分析和统计学方法中,反映某一变量为不同数值的情况下,数据渗透指标结果受到该变量的影响程度(即敏感度参数),以及,该变量的变化对于数据渗透指标结果的影响程度(即弹性指标参数),进而基于敏感度参数和弹性指标参数预测数据渗透指标结果,数据渗透指标结果用于衡量特定产品、需求或效果在目标用户群体中的普及程度。本实施例以该方法应用于终端进行举例说明,可以理解的是,该方法也可以应用于服务器,还可以应用于包括终端和服务器的系统,并通过终端和服务器的交互实现。本实施例中,该方法包括以下步骤:In one embodiment, as shown in FIG1 , a data prediction method is provided. In the field of data analysis, the data prediction method is based on a data prediction model. According to the user characteristic data of the existing objective data, in the data analysis and statistical methods, it reflects the degree to which a certain variable is affected by the variable when the variable has different values (i.e., the sensitivity parameter), and the degree to which the change of the variable affects the data penetration index result (i.e., the elasticity index parameter), and then predicts the data penetration index result based on the sensitivity parameter and the elasticity index parameter. The data penetration index result is used to measure the popularity of a specific product, demand or effect in the target user group. This embodiment is illustrated by applying the method to a terminal. It can be understood that the method can also be applied to a server, and can also be applied to a system including a terminal and a server, and is implemented through the interaction between the terminal and the server. In this embodiment, the method includes the following steps:
步骤102,获取各分层类型的用户的用户特征数据和资源转换数值。Step 102: Obtain user characteristic data and resource conversion values of users of each hierarchical type.
本申请实施例中,数据预测模型可以是共享出行电单车领域中,针对骑行卡用户的渗透率进行预测的半黑盒模型,对于数据预测模型的模型结构,在下述实施例中进行详细描述。用户的分层类型可以根据数据预测模型的使用角色的需求来划分,例如,按照使用角色的需求可以将用户分层的类型划分为用户、城市群用户和策略分群用户等;使用角色按照需求预先设置需要预测的资源转换数值,即骑行卡的金额对应的不同折扣率。In the embodiment of the present application, the data prediction model can be a semi-black box model for predicting the penetration rate of riding card users in the field of shared travel electric motorcycles. The model structure of the data prediction model is described in detail in the following embodiment. The user stratification type can be divided according to the needs of the user role of the data prediction model. For example, according to the needs of the user role, the user stratification type can be divided into users, city group users, and strategic group users; the user role pre-sets the resource conversion value that needs to be predicted according to the needs, that is, the different discount rates corresponding to the amount of the riding card.
终端按照预先划分的用户分层类型,通过共享电单车运营平台获取各分层类型的用户的用户画像,其中,用户画像包括多个维度的特征,例如,年龄、性别、出生时间、注册时间、首次骑行时间、用户登录客户端类型、最近3、5、7、15、30日总订单数、平均订单数、最大订单数、用车平均时长、最近一次用车距今天天数、所在城市竞争强度等级、所在城市车效等级、所在城市订单规模等级(体量)等。进而,终端将用户数据和预先设置的折扣率作为数据预测模型的输入数据。The terminal obtains the user portraits of users of each stratification type through the shared motorcycle operation platform according to the pre-divided user stratification types. The user portraits include features of multiple dimensions, such as age, gender, birth time, registration time, first ride time, user login client type, total number of orders in the last 3, 5, 7, 15, and 30 days, average number of orders, maximum number of orders, average duration of use, number of days since the last use of the car, competition intensity level of the city, vehicle efficiency level of the city, order scale level (volume) of the city, etc. Then, the terminal uses the user data and the pre-set discount rate as input data for the data prediction model.
终端获取到输入的用户数据后,通过数据预测模型对用户数据进行数据清洗和预处理,包括对用户数据中的缺失值、异常值和重复值的处理,并将不同维度的用户数据进行归一化或标准化处理等操作,确保输入的用户数据的质量和一致性。针对数据清洗和预处理后的用户数据,数据预测模型对用户数据进行特征提取、转换和组合,以获得更具信息量和预测能力的用户特征数据。可选的,对于非数值型的特征,数据预测模型需要将其转换为数值型数据才能进行处理,包括对分类特征进行独热编码、标签编码或其他编码方式,以便数据预测模型能够理解和处理。对于数据维度过高或存在冗余特征,数据预测模型可以进行特征降维操作,以减少计算复杂度和提高模型的泛化能力。After the terminal obtains the input user data, the data prediction model cleans and preprocesses the user data, including processing missing values, outliers and duplicate values in the user data, and normalizing or standardizing the user data of different dimensions to ensure the quality and consistency of the input user data. For the user data after data cleaning and preprocessing, the data prediction model extracts, transforms and combines the user data to obtain user feature data with more information and predictive capabilities. Optionally, for non-numeric features, the data prediction model needs to convert them into numerical data before processing, including unique hot encoding, label encoding or other encoding methods for classification features so that the data prediction model can understand and process them. For data with too high dimensions or redundant features, the data prediction model can perform feature dimensionality reduction operations to reduce computational complexity and improve the generalization ability of the model.
步骤104,根据数据预测模型对各分层类型的用户的用户特征数据进行资源转换敏感度分析和弹性数据渗透指标分析,得到敏感度参数和弹性指标参数。Step 104 , performing resource conversion sensitivity analysis and elastic data penetration index analysis on the user characteristic data of each stratified type of users according to the data prediction model, and obtaining sensitivity parameters and elastic index parameters.
本申请实施例中,数据预测模型包括神经网络结构和数据评定结构。其中,终端通过神经网络结构在模型训练过程中学习到的用户特征数据与敏感度参数和弹性指标参数的映射关系,对各分层类型的用户的用户特征数据进行资源转换敏感度分析和弹性数据渗透指标分析,确定不同分层类型的用户对于折扣价敏感程度和不同分层类型用户的数据渗透指标结果对于资源转换数值变化的弹性程度,从而得到各用户分层类型的敏感度参数和弹性指标参数,敏感度参数和弹性指标参数用于数据渗透指标结果的预测。In the embodiment of the present application, the data prediction model includes a neural network structure and a data assessment structure. Among them, the terminal uses the mapping relationship between the user feature data and the sensitivity parameters and elasticity index parameters learned by the neural network structure during the model training process to perform resource conversion sensitivity analysis and elastic data penetration index analysis on the user feature data of each stratified type of users, determine the sensitivity of users of different stratified types to discount prices and the elasticity of the data penetration index results of users of different stratified types to changes in resource conversion values, thereby obtaining the sensitivity parameters and elasticity index parameters of each user stratification type, and the sensitivity parameters and elasticity index parameters are used to predict the data penetration index results.
在一个示例性的实施例中,数据预测模型通过训练学习到各分层类型用户的用户特征数据与骑行卡折扣率以及渗透率之间的复杂映射关系,在训练完成后,数据预测模型会利用这些学习到的映射关系,对不同折扣率下各分层类型用户的渗透率进行预测。通过对折扣率的变化进行敏感度分析,可以确定不同类型用户对于折扣价的敏感程度,例如,个体城市用户可能对折扣率更为敏感,而策略分群用户可能相对不那么敏感。同时,数据预测模型也会分析不同分层类型用户的渗透率对于资源转换数值变化的弹性程度,即渗透率对于折扣率变化的敏感程度,进而得到用于预测各分层类型的用户在不同折扣率时的渗透率,其中,渗透率为持卡骑行用户占全部骑行用户的比例。In an exemplary embodiment, the data prediction model learns the complex mapping relationship between the user feature data of each stratified type of users and the discount rate and penetration rate of the cycling card through training. After the training is completed, the data prediction model will use these learned mapping relationships to predict the penetration rate of each stratified type of users at different discount rates. By performing a sensitivity analysis on the change in discount rate, the sensitivity of different types of users to the discount price can be determined. For example, individual city users may be more sensitive to discount rates, while strategic group users may be relatively less sensitive. At the same time, the data prediction model will also analyze the elasticity of the penetration rate of users of different stratified types to changes in resource conversion values, that is, the sensitivity of the penetration rate to changes in discount rates, and then obtain the penetration rate for predicting users of each stratified type at different discount rates, where the penetration rate is the proportion of card-holding cycling users to all cycling users.
步骤106,基于数据预测模型对敏感度参数、弹性指标参数以及各资源转换数值进行数据渗透指标结果预测,得到各分层类型的用户在各资源转换数值下的数据渗透指标结果。Step 106, based on the data prediction model, the sensitivity parameter, the elasticity index parameter and each resource conversion value are predicted for the data penetration index result, and the data penetration index result of each stratified type of user under each resource conversion value is obtained.
本申请实施例中,数据预测模型的神经网络结构与数据评定结构相连接,构成一个半黑盒模型。终端通过神经网络结构得到敏感度参数和弹性指标参数后,将敏感度参数和弹性指标参数传输至数据评定结构中,其中,数据评定结构的定义如下所示:In the embodiment of the present application, the neural network structure of the data prediction model is connected to the data assessment structure to form a semi-black box model. After the terminal obtains the sensitivity parameter and the elasticity index parameter through the neural network structure, the sensitivity parameter and the elasticity index parameter are transmitted to the data assessment structure, wherein the definition of the data assessment structure is as follows:
其中,为不同的用户分层类型,为第个分层类型的用户在资源转换数值为时的数据渗透指标结果,即第个分层类型的用户在折扣率为时的渗透率;为资源转换数值,即折扣率,例如,1、2、3…折;为敏感度参数,通常情况下大于0,说明在资源转换数值越小的情况下(例如,折扣率较高),预测得到的数据渗透指标结果越高(用户的渗透率越高);为不同用户分层类型的弹性指标参数。in, Different user stratification types, For the The user of the hierarchical type has a resource conversion value of The data penetration index result at the time The user of each stratification type has a discount rate of The permeability at ; The resource conversion value, i.e., the discount rate, for example, 1, 2, 3, etc. is a sensitivity parameter, which is usually greater than 0, indicating that when the resource conversion value is smaller (for example, the discount rate is higher), the predicted data penetration index result is higher (the user penetration rate is higher); Elasticity indicator parameters for different user stratification types.
在已经得到的敏感度参数、弹性指标参数以及各分层类型用户的资源转换数值的基础上,终端基于数据预测模型进行数据渗透指标结果的预测,即预测在不同折扣率下,各分层类型用户可能的渗透率结果,渗透率结果为共享出行平台中后续的决策提供依据。Based on the obtained sensitivity parameters, elasticity index parameters and resource conversion values of each stratified type of users, the terminal predicts the data penetration index results based on the data prediction model, that is, predicts the possible penetration rate results of each stratified type of users under different discount rates. The penetration rate results provide a basis for subsequent decision-making in the shared travel platform.
在一个示例性的实施例中,在医疗领域中,资源转换数值可以指代患者接受治疗或采取特定健康管理措施后的生理、病理或医疗数据指标,例如血压、血糖、生理功能指标等客观指标。在医疗数据分析中,敏感度参数可以表示治疗方案、药物或治疗措施对患者健康指标变化的响应程度,该敏感度参数反映了特定治疗因素对患者生理状态的影响程度;弹性指标参数用于衡量医疗系统或治疗方案中某一变量的变化对患者生理状态的响应程度,可以涵盖从患者治疗结果到治疗方案中治疗资源的多个方面,用于评估在实施不同治疗策略或调整资源分配时,患者的生理状态对于治疗资源变化的反应。敏感度参数和弹性指标参数均是基于实际数据和分析得出的量化的共线因子,最终的数据渗透指标结果可以根据数据预测模型评估患者的治疗方案的预期结果。In an exemplary embodiment, in the medical field, the resource conversion value may refer to physiological, pathological or medical data indicators of patients after receiving treatment or taking specific health management measures, such as objective indicators such as blood pressure, blood sugar, and physiological function indicators. In medical data analysis, the sensitivity parameter can indicate the degree of response of the treatment plan, drug or treatment measure to the change of the patient's health indicator. The sensitivity parameter reflects the degree of influence of a specific treatment factor on the patient's physiological state; the elasticity index parameter is used to measure the degree of response of the change of a variable in the medical system or treatment plan to the patient's physiological state, which can cover multiple aspects from the patient's treatment results to the treatment resources in the treatment plan, and is used to evaluate the patient's physiological state when implementing different treatment strategies or adjusting resource allocation. The sensitivity parameter and the elasticity index parameter are both quantitative collinear factors based on actual data and analysis. The final data penetration index result can evaluate the expected results of the patient's treatment plan according to the data prediction model.
上述数据预测方法中,根据数据预测模型对各分层类型的用户的用户特征数据进行资源转换敏感度分析和弹性数据渗透指标分析,基于预先学习的到各分层类型间用户特征的结合信息,分析计算敏感度参数和弹性指标参数,可以实现不同分层类型的用户特征数据间的信息共享,避免数据稀疏问题,并根据敏感度参数和弹性指标参数对数据渗透指标结果进行预测,保证数据渗透指标结果在合理范围,进而提高数据预测的准确性。In the above data prediction method, resource conversion sensitivity analysis and elastic data penetration index analysis are performed on the user feature data of users of each stratification type according to the data prediction model. Based on the pre-learned combined information of user features between each stratification type, the sensitivity parameters and elasticity index parameters are analyzed and calculated. This can achieve information sharing between user feature data of different stratification types and avoid data sparsity problems. The data penetration index results are predicted based on the sensitivity parameters and elasticity index parameters to ensure that the data penetration index results are within a reasonable range, thereby improving the accuracy of data prediction.
在一个示例性的实施例中,如图2所示,步骤106包括步骤202至步骤204。其中:In an exemplary embodiment, as shown in FIG2 , step 106 includes steps 202 to 204. Among them:
步骤202,根据数据预测模型中的数据评定结构对弹性指标参数进行数据处理,得到初始数据渗透指标结果。Step 202, performing data processing on elasticity index parameters according to the data evaluation structure in the data prediction model to obtain initial data penetration index results.
其中,数据评定结构可以为Logit模型结构。The data evaluation structure may be a Logit model structure.
本申请实施例中,参照上述实施例中对评定结构的定义,终端根据用户渗透率弹性参数进行初步分析,从而得到初始渗透指标结果,以共享出行平台渗透率预测为例进行说明,即渗透率弹性参数反映了用户在不同折扣率下对于渗透率变化的敏感程度,当较大时,说明该用户类型的用户在折扣变化时对渗透率的响应更为敏感,渗透率变化更为剧烈,即在折扣率较大的情况下,该用户类型的用户对于骑行卡的购买意愿。In the embodiment of the present application, referring to the definition of the evaluation structure in the above embodiment, the terminal is based on the user penetration elasticity parameter A preliminary analysis is performed to obtain the initial penetration index results, taking the penetration rate prediction of the shared travel platform as an example, that is, the penetration rate elasticity parameter It reflects the user's sensitivity to the change in penetration rate under different discount rates. When it is larger, it means that users of this user type are more sensitive to the penetration rate when the discount changes, and the penetration rate changes more dramatically, that is, when the discount rate is larger, users of this user type are less willing to buy cycling cards.
步骤204,基于敏感度参数、资源转换数值和数据评定结构对初始数据渗透指标结果进行修正和约束,得到每个分层类型的用户在各资源转换数值下的数据渗透指标结果。Step 204, based on the sensitivity parameters, resource conversion values and data assessment structure, the initial data penetration index results are modified and constrained to obtain the data penetration index results of each hierarchical type of user under each resource conversion value.
本申请实施例中,由于仅使用弹性指标参数进行预测,可能出现违反边际收益递减的反现实问题,因此,本申请实施例中通过引入神经网络结构与数据评定结构,通过神经网络预测得到的敏感度参数对初始数据渗透指标结果进行修正和约束对初始的数据渗透指标结果进行进一步调整,以更准确地反映不同分层类型用户在不同资源转换数值下的数据渗透指标结果,以确保调整后的结果符合实际场景和业务需求。In the embodiment of the present application, since only elasticity indicator parameters are used for prediction, the counter-realistic problem of violating diminishing marginal returns may occur. Therefore, in the embodiment of the present application, a neural network structure and a data assessment structure are introduced, and the sensitivity parameters obtained by the neural network prediction are used to correct and constrain the initial data penetration indicator results. The initial data penetration indicator results are further adjusted to more accurately reflect the data penetration indicator results of users of different stratified types under different resource conversion values, so as to ensure that the adjusted results meet the actual scenarios and business needs.
本实施例中,通过数据预测模型中的数据评定结构和弹性指标参数将用户数据特征转换成可解释的初始数据渗透指标结果,基于敏感度参数、资源转换数值和数据评定结构,对初始数据渗透指标结果进行修正和约束,以更准确地反映不同分层类型的用户在不同资源转换数值下的数据渗透情况,保证数据渗透指标结果在合理范围,进而提高数据预测的准确性。In this embodiment, the user data characteristics are converted into interpretable initial data penetration index results through the data assessment structure and elasticity index parameters in the data prediction model. The initial data penetration index results are corrected and constrained based on the sensitivity parameters, resource conversion values and data assessment structure to more accurately reflect the data penetration of users of different stratification types under different resource conversion values, ensure that the data penetration index results are within a reasonable range, and thereby improve the accuracy of data prediction.
在一个示例性的实施例中,如图3所示,步骤102之前,该方法还包括步骤302至步骤306。其中:In an exemplary embodiment, as shown in FIG3 , before step 102, the method further includes steps 302 to 306. Among them:
步骤302,获取待训练的数据预测模型的神经网络结构和初始样本集。Step 302, obtaining the neural network structure and initial sample set of the data prediction model to be trained.
本申请实施例中,首先确定用于训练的神经网络结构,包括网络的层数、每层的神经元数量、激活函数等。然后收集和整理初始样本集,初始样本集将用来训练神经网络模型,初始样本集中包括不同分层类型的用户的用户数据,在每个分层类型的用户数据中包含不同资源转换数值下的多组样本集以及每个样本集对应的数据渗透指标标签。In the embodiment of the present application, the neural network structure for training is first determined, including the number of layers of the network, the number of neurons in each layer, the activation function, etc. Then, the initial sample set is collected and sorted, and the initial sample set will be used to train the neural network model. The initial sample set includes user data of users of different hierarchical types, and the user data of each hierarchical type contains multiple groups of sample sets under different resource conversion values and the data penetration indicator label corresponding to each sample set.
步骤304,根据倾向分匹配规则对初始样本集进行实验样本集和对照样本集的筛选,并基于实验样本集和对照样本集构建目标样本集。Step 304 , screening the initial sample set for the experimental sample set and the control sample set according to the propensity score matching rule, and constructing the target sample set based on the experimental sample set and the control sample set.
本申请实施例中,终端根据倾向分匹配规则筛选出实验样本集和对照样本集,以保证实验样本集和对照样本集在某些关键特征上的分布一致性。然后,基于实验样本集和对照样本集构建目标样本集,用于后续的训练和测试。In the embodiment of the present application, the terminal selects the experimental sample set and the control sample set according to the propensity score matching rule to ensure the distribution consistency of the experimental sample set and the control sample set in certain key features. Then, a target sample set is constructed based on the experimental sample set and the control sample set for subsequent training and testing.
在一个可选的实施例中,终端可以通过将实验样本集和对照样本集在一些关键协变量上的分布进行比较,然后选择一种方法(例如,最近邻匹配算法)来匹配实验样本集和对照样本集,以确保两个样本集在关键协变量上的分布趋于一致。其中,对于在用户数据中关键协变量的选择,将同时影响干预分配和结果的变量作为关键协变量,例如,年龄可能是影响用户被某个折扣转化的因素,同时也会影响用户某天持卡骑行,因此作为关键协变量之一;被干预项影响的变量应该排除,例如,折扣会影响不同的骑行卡商品,因此细分的骑行卡特征已经有偏了,应将其排除。In an optional embodiment, the terminal can compare the distribution of the experimental sample set and the control sample set on some key covariates, and then select a method (e.g., nearest neighbor matching algorithm) to match the experimental sample set and the control sample set to ensure that the distribution of the two sample sets on the key covariates tends to be consistent. Among them, for the selection of key covariates in user data, variables that affect both intervention allocation and results are used as key covariates. For example, age may be a factor that affects the conversion of users to a certain discount, and it also affects the user's card-holding riding on a certain day, so it is used as one of the key covariates; variables affected by intervention items should be excluded. For example, discounts will affect different cycling card products, so the segmented cycling card features are already biased and should be excluded.
步骤306,根据目标样本集对神经网络结构进行训练,在神经网络结构满足预设迭代条件时,在神经网络结构的基础上结合数据评定结构,得到训练完成的数据预测模型。Step 306, training the neural network structure according to the target sample set, and when the neural network structure meets the preset iteration conditions, combining the data evaluation structure on the basis of the neural network structure to obtain a trained data prediction model.
其中,神经网络结构可以为多层MLP(Multilayer Perceptron,多层感知机)。Among them, the neural network structure can be a multi-layer MLP (Multilayer Perceptron).
本申请实施例中,终端使用目标样本集对神经网络结构进行训练,将目标样本集传递给网络的输入层,并通过网络的隐藏层传播,最终得到输出层的预测结果,通过优化损失函数,使得神经网络结构输出的和尽可能接近数据渗透指标标签。终端可以采用反向传播算法结合优化器(例如,梯度下降)来调整神经网络结构中的权重和偏置,以最小化损失函数。当神经网络结构的损失值满足预设阈值或神经网络结构的迭代次数满足预设迭代条件,终端结束对神经网络结构的训练,并将将训练好的神经网络结构与数据评定结构相结合,得到训练完成的数据预测模型。In the embodiment of the present application, the terminal uses the target sample set to train the neural network structure, passes the target sample set to the input layer of the network, and propagates through the hidden layer of the network, and finally obtains the prediction result of the output layer. By optimizing the loss function, the output of the neural network structure is and As close as possible to the data penetration indicator label. The terminal can use the back propagation algorithm combined with the optimizer (e.g., gradient descent) to adjust the weights and biases in the neural network structure to minimize the loss function. When the loss value of the neural network structure meets the preset threshold or the number of iterations of the neural network structure meets the preset iteration condition, the terminal ends the training of the neural network structure and combines the trained neural network structure with the data assessment structure to obtain a trained data prediction model.
本实施例中,倾向分匹配规则对初始样本集进行实验样本集和对照样本集的筛选,可以确保实验样本集和对照样本集之间的比较更具有可比性,减少潜在的偏差,保证样本质量的前提下避免了高昂的对照实验来收集样本,并提高模型的泛化能力,进而提高数据预测的准确性。In this embodiment, the propensity score matching rule screens the initial sample set for the experimental sample set and the control sample set, which can ensure that the comparison between the experimental sample set and the control sample set is more comparable, reduce potential bias, avoid expensive control experiments to collect samples while ensuring sample quality, and improve the generalization ability of the model, thereby improving the accuracy of data prediction.
在一个示例性的实施例中,如图4所示,步骤304包括步骤402至步骤406。其中:In an exemplary embodiment, as shown in FIG4 , step 304 includes steps 402 to 406. Among them:
步骤402,根据倾向分预测模型对初始样本集进行预测处理,得到初始样本集中每个用户样本对应的倾向性得分。Step 402 , predicting the initial sample set according to the propensity score prediction model, and obtaining the propensity score corresponding to each user sample in the initial sample set.
本申请实施例中,倾向分匹配规则包括根据倾向分预测模型和后续基于用户样本的倾向分进行实验样本集和对照样本集匹配的策略。倾向分预测模型可以为机器学习模型,例如,逻辑回归模型。终端根据倾向分预测模型对初始样本集中每个用户样本进行倾向性得分的预测,得到每个用户样本的倾向性得分,倾向性得分用于反应不同资源转换数值下的用户样本之间特征分布的相似性。In an embodiment of the present application, the propensity score matching rule includes a strategy for matching the experimental sample set and the control sample set based on the propensity score prediction model and the subsequent propensity score based on the user sample. The propensity score prediction model can be a machine learning model, for example, a logistic regression model. The terminal predicts the propensity score of each user sample in the initial sample set according to the propensity score prediction model to obtain the propensity score of each user sample. The propensity score is used to reflect the similarity of the feature distribution between user samples under different resource conversion values.
步骤404,将初始样本集中每一分层类型的用户在每一资源转换数值下的样本集作为实验样本集,基于倾向性得分,在分层类型的用户的其他资源转换数值的其他实验样本集中确定与实验样本集相匹配的对照样本集。Step 404, taking the sample set of each stratified type of users under each resource conversion value in the initial sample set as the experimental sample set, and determining a control sample set matching the experimental sample set from other experimental sample sets of other resource conversion values of users of the stratified type based on the propensity score.
本申请实施例中,由于在初始样本集中包含多个分层类型的用户,每个分层类型的用户中包含多个资源转换数值的进一步分组,其中,对于每个分层类型的用户,均需要确定出多组实验样本集和对照样本集,且每个分层类型的用户中,实验样本集和对照样本集的筛选过程相同。以某一个分层类型的用户为例进行说明,终端分别将每个资源转换数值下的样本集作为实验样本集,并依据每个实验样本集中各用户样本的倾向性得分,在该分层类型的用户中其他资源转换数值的分组下的其他实验样本集中,确定出与当前资源转换数值下的实验样本集中用户样本的倾向性得分相匹配的对照样本集中的用户样本,进而得到对照样本集。在此基础上,每个资源转换数值下的实验样本集均包含一个或多个对照样本集。In an embodiment of the present application, since the initial sample set includes multiple hierarchical types of users, each hierarchical type of users includes multiple further groupings of resource conversion values, wherein for each hierarchical type of users, multiple groups of experimental sample sets and control sample sets need to be determined, and the screening process of the experimental sample sets and the control sample sets is the same for each hierarchical type of users. Taking a certain hierarchical type of user as an example, the terminal uses the sample sets under each resource conversion value as experimental sample sets, and based on the propensity scores of each user sample in each experimental sample set, in other experimental sample sets under the grouping of other resource conversion values in the hierarchical type of users, determines the user samples in the control sample set that match the propensity scores of the user samples in the experimental sample set under the current resource conversion value, thereby obtaining the control sample set. On this basis, each experimental sample set under a resource conversion value contains one or more control sample sets.
在一个具体的实施例中,以某个分层类型的用户为例,终端对该分层类型中,每天有3、4、5折的骑行用户,其中不同折扣下用户量不一致,用户持卡量也不一致,无法直接比较折扣和用户渗透率的关系,则通过实验样本集和对照样本集的筛选,各折扣组都会从其他折扣组中匹配与自身的用户样本数量相同的对照样本,构成对照样本集,即3折下的骑行用户在4折下的骑行用户中进行匹配,得到对照样本集4',3折下的骑行用户在5折下的骑行用户中进行匹配,得到对照样本集5',同理4折和5折下的骑行用户可分别得到对照样本集3',对照样本集5';对照样本集3',对照样本集4',即样本序列有[(3,4',5'),(3',4,5'),(3',4',5)]。In a specific embodiment, taking a user of a certain stratified type as an example, the terminal has 30%, 40%, and 50% off cycling users in this stratified type every day, wherein the number of users at different discounts is inconsistent, and the number of user card holders is also inconsistent, and the relationship between discount and user penetration cannot be directly compared. Then, through the screening of the experimental sample set and the control sample set, each discount group will match the control samples with the same number of user samples as its own from other discount groups to form a control sample set, that is, the cycling users at 30% off are matched with the cycling users at 40% off to obtain the control sample set 4', the cycling users at 30% off are matched with the cycling users at 50% off to obtain the control sample set 5', and similarly, the cycling users at 40% off and 50% off can obtain the control sample set 3' and the control sample set 5' respectively; the control sample set 3' and the control sample set 4', that is, the sample sequence is [(3,4',5'), (3',4,5'), (3',4',5)].
步骤406,基于各实验样本集和实验样本集对应的对照样本集,构建目标样本集。Step 406: construct a target sample set based on each experimental sample set and a control sample set corresponding to the experimental sample set.
本申请实施例中,终端将每个实验样本集与其对应的对照样本集进行组合,构建目标样本集。每个目标样本集包含了多个实验样本集与实验样本集对应的对照样本集,保证了实验样本集和对照样本集的样本特征分布在倾向性上的平衡性。In the embodiment of the present application, the terminal combines each experimental sample set with its corresponding control sample set to construct a target sample set. Each target sample set includes multiple experimental sample sets and control sample sets corresponding to the experimental sample sets, ensuring the balance of the sample feature distribution of the experimental sample sets and the control sample sets in terms of tendency.
在一个可选的实施例中,终端可根据边际收益递减规律对每日构建出的多组样本序列进行二次筛选,边际收益递减规律指的是随着样本数量的增加,额外增加的样本所带来的收益逐渐减少。具体地,终端可以对每日构建出的多组样本序列进行评估,计算每组样本序列的预期收益或效益,对样本序列按照预期收益或效益进行排序,从高到低排列。然后,终端根据边际收益递减规律,选取满足特定条件的样本序列进行保留,而剔除较低收益的样本序列。其中,边际收益递减规律的具体条件可以为设置一个阈值或规则,只保留预期收益高于该阈值的样本序列,或者只保留收益在前百分之几的样本序列等。终端将完成二次筛选的实验样本集和对照样本集构成的样本序列作为目标样本集。In an optional embodiment, the terminal can perform secondary screening on the multiple groups of sample sequences constructed daily according to the law of diminishing marginal returns. The law of diminishing marginal returns means that as the number of samples increases, the benefits brought by the additional samples gradually decrease. Specifically, the terminal can evaluate the multiple groups of sample sequences constructed daily, calculate the expected benefits or benefits of each group of sample sequences, and sort the sample sequences according to the expected benefits or benefits, from high to low. Then, according to the law of diminishing marginal returns, the terminal selects sample sequences that meet specific conditions for retention, and eliminates sample sequences with lower returns. Among them, the specific conditions of the law of diminishing marginal returns can be to set a threshold or rule, and only retain sample sequences with expected returns higher than the threshold, or only retain sample sequences with returns in the top few percent. The terminal uses the sample sequence consisting of the experimental sample set and the control sample set that have completed the secondary screening as the target sample set.
本实施例中,通过倾向性得分确定与实验样本集相匹配的对照样本集,可以确保实验组和对照组在倾向分匹配后更加平衡,提高数据预测模型训练的泛化性,提高数据预测模型的数据处理准确性,进而提高数据预测的准确性。In this embodiment, by determining the control sample set that matches the experimental sample set through the propensity score, it can ensure that the experimental group and the control group are more balanced after the propensity score matching, improve the generalization of the data prediction model training, improve the data processing accuracy of the data prediction model, and thus improve the accuracy of data prediction.
在一个示例性的实施例中,如图5所示,步骤404包括步骤502至步骤504。其中:In an exemplary embodiment, as shown in FIG5 , step 404 includes steps 502 to 504. Among them:
步骤502,在各分层类型中,针对每一资源转换数值中的实验样本集,按照最近邻匹配算法检索与实验样本集中各用户样本的倾向性得分相匹配的目标倾向性得分。Step 502 : In each stratification type, for each experimental sample set in the resource conversion value, a target propensity score matching the propensity score of each user sample in the experimental sample set is retrieved according to a nearest neighbor matching algorithm.
本申请实施例中,终端使用最临近匹配算法(Nearest Neighbour Matching),在每个分层类型的用户中,针对每个资源转换数值的实验样本集,在其他资源转换数值的其他实验样本集中检索与当前实验样本集中用户样本的倾向性得分匹配度较高的目标倾向性得分。其中,根据倾向性得分的匹配程度,可以设置一定的阈值或准则,例如,最近邻倾向性得分之差小于某个阈值,则认为两个样本匹配,可以确保实验组和对照组在倾向性上的匹配度达到预期水平。In the embodiment of the present application, the terminal uses the nearest neighbor matching algorithm (Nearest Neighbour Matching), and for each stratified type of user, for each experimental sample set of resource conversion values, searches for a target propensity score that has a high degree of matching with the propensity score of the user sample in the current experimental sample set in other experimental sample sets of other resource conversion values. According to the matching degree of the propensity score, a certain threshold or criterion can be set. For example, if the difference between the nearest neighbor propensity scores is less than a certain threshold, the two samples are considered to match, which can ensure that the matching degree of the experimental group and the control group in terms of propensity reaches the expected level.
步骤504,基于目标倾向性得分确定与实验样本集相匹配的对照样本集。Step 504 , determining a control sample set that matches the experimental sample set based on the target propensity score.
本申请实施例中,终端根据目标倾向性得分对应的用户样本,在其他实验样本集中确定与当前实验样本集中对应的用户样本,进而得到对照样本集。In the embodiment of the present application, the terminal determines the user samples corresponding to the current experimental sample set in other experimental sample sets based on the user samples corresponding to the target propensity scores, and then obtains the control sample set.
本实施例中,通过倾向性得分确定与实验样本集相匹配的对照样本集,可以确保实验组和对照组在倾向分匹配后更加平衡,提高数据预测模型训练的泛化性,提高数据预测模型的数据处理准确性,进而提高数据预测的准确性。In this embodiment, by determining the control sample set that matches the experimental sample set through the propensity score, it can ensure that the experimental group and the control group are more balanced after the propensity score matching, improve the generalization of the data prediction model training, improve the data processing accuracy of the data prediction model, and thus improve the accuracy of data prediction.
在一个示例性的实施例中,如图6所示,步骤406包括步骤602至步骤604。其中:In an exemplary embodiment, as shown in FIG6 , step 406 includes steps 602 to 604. Among them:
步骤602,计算各实验样本集和实验样本集对应的对照样本集之间的标准化均值差。Step 602: Calculate the standardized mean difference between each experimental sample set and the control sample set corresponding to the experimental sample set.
本申请实施例中,终端使用SMD(Standardized Mean Difference)标准化均值差检验实验样本集和每个对照样本集中匹配出来的用户样本间协变量是否分布相近,其中,用户样本间协变量分布的标准化均值差SMD计算如下所示:In the embodiment of the present application, the terminal uses the Standardized Mean Difference (SMD) to test whether the distribution of the covariates between the user samples matched in the experimental sample set and each control sample set is similar, wherein the standardized mean difference SMD of the covariate distribution between the user samples is calculated as follows:
其中,和分别为某分层类型下进行对比的两个不同资源转换数值下用户的平均特征,和表示各自特征方差。in, and are the average characteristics of users under two different resource conversion values compared under a certain stratification type, and represents the respective characteristic variance.
步骤604,在标准化均值差满足预设阈值条件的情况下,将各实验样本集和实验样本集对应的对照样本集确定为目标样本集。Step 604 : when the standardized mean difference satisfies a preset threshold condition, each experimental sample set and a control sample set corresponding to the experimental sample set are determined as a target sample set.
本申请实施例中,终端根据预设的阈值条件,判断实验样本集和对照样本集在各个分层类型下的标准化均值差是否满足训练要求,如果标准化均值差满足预设阈值条件,则实验样本集和对照样本集在该分层类型下的特征分布相近,可以作为目标样本集。如果实验样本集和对照样本集在某些分层类型下的标准化均值差未满足预设的阈值条件,可以进行倾向性分数匹配后的用户样本调整或者倾向性得分的重新计算,或者重新选择对照样本,以确保更好地匹配实验样本集。In the embodiment of the present application, the terminal determines whether the standardized mean difference of the experimental sample set and the control sample set under each stratification type meets the training requirements according to the preset threshold conditions. If the standardized mean difference meets the preset threshold conditions, the experimental sample set and the control sample set have similar feature distributions under the stratification type and can be used as the target sample set. If the standardized mean difference of the experimental sample set and the control sample set under certain stratification types does not meet the preset threshold conditions, the user sample adjustment after propensity score matching or the recalculation of the propensity score can be performed, or the control sample can be reselected to ensure better matching of the experimental sample set.
本实施例中,通过计算标准化均值差并根据预设阈值条件确定目标样本集,可以在保持实验样本集和对照样本集在各个分层类型下的均衡性的同时,确保数据预测模型的训练效果,进而提高数据预测的准确性。In this embodiment, by calculating the standardized mean difference and determining the target sample set according to the preset threshold condition, the training effect of the data prediction model can be ensured while maintaining the balance of the experimental sample set and the control sample set under each stratification type, thereby improving the accuracy of data prediction.
应该理解的是,虽然如上所述的各实施例所涉及的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,如上所述的各实施例所涉及的流程图中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the various steps in the flowcharts involved in the above-mentioned embodiments are displayed in sequence according to the indication of the arrows, these steps are not necessarily executed in sequence according to the order indicated by the arrows. Unless there is a clear explanation in this article, the execution of these steps does not have a strict order restriction, and these steps can be executed in other orders. Moreover, at least a part of the steps in the flowcharts involved in the above-mentioned embodiments can include multiple steps or multiple stages, and these steps or stages are not necessarily executed at the same time, but can be executed at different times, and the execution order of these steps or stages is not necessarily carried out in sequence, but can be executed in turn or alternately with other steps or at least a part of the steps or stages in other steps.
基于同样的发明构思,本申请实施例还提供了一种用于实现上述所涉及的数据预测方法的数据预测装置。该装置所提供的解决问题的实现方案与上述方法中所记载的实现方案相似,故下面所提供的一个或多个数据预测装置实施例中的具体限定可以参见上文中对于数据预测方法的限定,在此不再赘述。Based on the same inventive concept, the embodiment of the present application also provides a data prediction device for implementing the data prediction method involved above. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the above method, so the specific limitations in one or more data prediction device embodiments provided below can refer to the limitations on the data prediction method above, and will not be repeated here.
在一个示例性的实施例中,如图7所示,提供了一种数据预测装置700,包括:第一获取模块701、分析模块702和预测模块703,其中:In an exemplary embodiment, as shown in FIG7 , a data prediction device 700 is provided, comprising: a first acquisition module 701, an analysis module 702 and a prediction module 703, wherein:
第一获取模块701,用于获取各分层类型的用户的用户特征数据和资源转换数值;The first acquisition module 701 is used to acquire user characteristic data and resource conversion values of users of each hierarchical type;
分析模块702,用于根据数据预测模型对各分层类型的用户的用户特征数据进行资源转换敏感度分析和弹性数据渗透指标分析,得到敏感度参数和弹性指标参数;An analysis module 702 is used to perform resource conversion sensitivity analysis and elastic data penetration index analysis on user feature data of each stratified type of user according to a data prediction model to obtain sensitivity parameters and elastic index parameters;
预测模块703,用于基于数据预测模型对敏感度参数、弹性指标参数以及各资源转换数值进行数据渗透指标结果预测,得到各分层类型的用户在各资源转换数值下的数据渗透指标结果。The prediction module 703 is used to predict the data penetration index results of the sensitivity parameters, elasticity index parameters and each resource conversion value based on the data prediction model, and obtain the data penetration index results of each layer type of users under each resource conversion value.
在其中一个实施例中,预测模块703具体用于根据数据预测模型中的数据评定结构对弹性指标参数进行数据处理,得到初始数据渗透指标结果;In one embodiment, the prediction module 703 is specifically used to perform data processing on elasticity index parameters according to the data evaluation structure in the data prediction model to obtain an initial data penetration index result;
基于敏感度参数、资源转换数值和数据评定结构对初始数据渗透指标结果进行修正和约束,得到每个分层类型的用户在各资源转换数值下的数据渗透指标结果。Based on the sensitivity parameters, resource conversion values and data assessment structure, the initial data penetration index results are modified and constrained to obtain the data penetration index results of each stratified type of user under each resource conversion value.
在其中一个实施例中,该装置700还包括:In one embodiment, the apparatus 700 further includes:
第二获取模块,用于获取待训练的数据预测模型的神经网络结构和初始样本集;The second acquisition module is used to obtain the neural network structure and initial sample set of the data prediction model to be trained;
筛选模块,用于根据倾向分匹配规则对初始样本集进行实验样本集和对照样本集的筛选,并基于实验样本集和对照样本集构建目标样本集;A screening module is used to screen the initial sample set into an experimental sample set and a control sample set according to the propensity score matching rule, and to construct a target sample set based on the experimental sample set and the control sample set;
训练模块,用于根据目标样本集对神经网络结构进行训练,在神经网络结构满足预设迭代条件时,在神经网络结构的基础上结合数据评定结构,得到训练完成的数据预测模型。The training module is used to train the neural network structure according to the target sample set. When the neural network structure meets the preset iteration conditions, the data evaluation structure is combined with the neural network structure to obtain a trained data prediction model.
在其中一个实施例中,筛选模块具体用于根据倾向分预测模型对初始样本集进行预测处理,得到初始样本集中每个用户样本对应的倾向性得分;In one embodiment, the screening module is specifically used to perform prediction processing on the initial sample set according to the propensity score prediction model to obtain the propensity score corresponding to each user sample in the initial sample set;
将初始样本集中每一分层类型的用户在每一资源转换数值下的样本集作为实验样本集,基于倾向性得分,在分层类型的用户的其他资源转换数值的其他实验样本集中确定与实验样本集相匹配的对照样本集;The sample set of each stratified type of users in the initial sample set under each resource conversion value is used as the experimental sample set, and based on the propensity score, a control sample set matching the experimental sample set is determined from other experimental sample sets of other resource conversion values of the stratified type of users;
基于各实验样本集和实验样本集对应的对照样本集,构建目标样本集。A target sample set is constructed based on each experimental sample set and a control sample set corresponding to the experimental sample set.
在其中一个实施例中,筛选模块具体用于在各分层类型中,针对每一资源转换数值中的实验样本集,按照最近邻匹配算法检索与实验样本集中各用户样本的倾向性得分相匹配的目标倾向性得分;In one embodiment, the screening module is specifically used to retrieve, for each experimental sample set in each resource conversion value in each stratification type, a target propensity score that matches the propensity score of each user sample in the experimental sample set according to a nearest neighbor matching algorithm;
基于目标倾向性得分确定与实验样本集相匹配的对照样本集。A control sample set matched to the experimental sample set was determined based on the target propensity score.
在其中一个实施例中,筛选模块具体用于计算各实验样本集和实验样本集对应的对照样本集之间的标准化均值差;In one embodiment, the screening module is specifically used to calculate the standardized mean difference between each experimental sample set and a control sample set corresponding to the experimental sample set;
在标准化均值差满足预设阈值条件的情况下,将各实验样本集和实验样本集对应的对照样本集确定为目标样本集。When the standardized mean difference meets the preset threshold condition, each experimental sample set and the control sample set corresponding to the experimental sample set are determined as the target sample set.
上述数据预测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。Each module in the above data prediction device can be implemented in whole or in part by software, hardware, or a combination thereof. Each module can be embedded in or independent of a processor in a computer device in the form of hardware, or can be stored in a memory in a computer device in the form of software, so that the processor can call and execute operations corresponding to each module.
在一个示例性的实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图8所示。该计算机设备包括处理器、存储器、输入/输出接口(Input/Output,简称I/O)和通信接口。其中,处理器、存储器和输入/输出接口通过系统总线连接,通信接口通过输入/输出接口连接到系统总线。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质和内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储用户特征数据。该计算机设备的输入/输出接口用于处理器与外部设备之间交换信息。该计算机设备的通信接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种数据预测方法。In an exemplary embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be shown in FIG8. The computer device includes a processor, a memory, an input/output interface (Input/Output, referred to as I/O) and a communication interface. The processor, the memory and the input/output interface are connected via a system bus, and the communication interface is connected to the system bus via the input/output interface. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program and a database. The internal memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used to store user feature data. The input/output interface of the computer device is used to exchange information between the processor and an external device. The communication interface of the computer device is used to communicate with an external terminal via a network connection. When the computer program is executed by the processor, a data prediction method is implemented.
本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art will understand that the structure shown in FIG. 8 is merely a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may include more or fewer components than shown in the figure, or combine certain components, or have a different arrangement of components.
在一个示例性的实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现以下步骤:In an exemplary embodiment, a computer device is provided, including a memory and a processor, wherein a computer program is stored in the memory, and when the processor executes the computer program, the following steps are implemented:
获取各分层类型的用户的用户特征数据和资源转换数值;Obtain user feature data and resource conversion values of users of each stratification type;
根据数据预测模型对各分层类型的用户的用户特征数据进行资源转换敏感度分析和弹性数据渗透指标分析,得到敏感度参数和弹性指标参数;According to the data prediction model, the resource conversion sensitivity analysis and elastic data penetration index analysis are performed on the user characteristic data of each stratified type of users to obtain the sensitivity parameters and elastic index parameters;
基于数据预测模型对敏感度参数、弹性指标参数以及各资源转换数值进行数据渗透指标结果预测,得到各分层类型的用户在各资源转换数值下的数据渗透指标结果。Based on the data prediction model, the data penetration index results are predicted for the sensitivity parameters, elasticity index parameters and each resource conversion value, and the data penetration index results of users of each stratification type under each resource conversion value are obtained.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:In one embodiment, when the processor executes the computer program, the processor further implements the following steps:
根据数据预测模型中的数据评定结构对弹性指标参数进行数据处理,得到初始数据渗透指标结果;Perform data processing on elasticity index parameters according to the data evaluation structure in the data prediction model to obtain initial data penetration index results;
基于敏感度参数、资源转换数值和数据评定结构对初始数据渗透指标结果进行修正和约束,得到每个分层类型的用户在各资源转换数值下的数据渗透指标结果。Based on the sensitivity parameters, resource conversion values and data assessment structure, the initial data penetration index results are modified and constrained to obtain the data penetration index results of each stratified type of user under each resource conversion value.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:In one embodiment, when the processor executes the computer program, the processor further implements the following steps:
获取待训练的数据预测模型的神经网络结构和初始样本集;Obtain the neural network structure and initial sample set of the data prediction model to be trained;
根据倾向分匹配规则对初始样本集进行实验样本集和对照样本集的筛选,并基于实验样本集和对照样本集构建目标样本集;The initial sample set is screened for the experimental sample set and the control sample set according to the propensity score matching rule, and the target sample set is constructed based on the experimental sample set and the control sample set;
根据目标样本集对神经网络结构进行训练,在神经网络结构满足预设迭代条件时,在神经网络结构的基础上结合数据评定结构,得到训练完成的数据预测模型。The neural network structure is trained according to the target sample set. When the neural network structure meets the preset iteration conditions, the data evaluation structure is combined with the neural network structure to obtain a trained data prediction model.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:In one embodiment, when the processor executes the computer program, the processor further implements the following steps:
根据倾向分预测模型对初始样本集进行预测处理,得到初始样本集中每个用户样本对应的倾向性得分;The initial sample set is predicted according to the propensity score prediction model to obtain the propensity score corresponding to each user sample in the initial sample set;
将初始样本集中每一分层类型的用户在每一资源转换数值下的样本集作为实验样本集,基于倾向性得分,在分层类型的用户的其他资源转换数值的其他实验样本集中确定与实验样本集相匹配的对照样本集;The sample set of each stratified type of users in the initial sample set under each resource conversion value is used as the experimental sample set, and based on the propensity score, a control sample set matching the experimental sample set is determined from other experimental sample sets of other resource conversion values of the stratified type of users;
基于各实验样本集和实验样本集对应的对照样本集,构建目标样本集。A target sample set is constructed based on each experimental sample set and a control sample set corresponding to the experimental sample set.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:In one embodiment, when the processor executes the computer program, the processor further implements the following steps:
在各分层类型中,针对每一资源转换数值中的实验样本集,按照最近邻匹配算法检索与实验样本集中各用户样本的倾向性得分相匹配的目标倾向性得分;In each stratification type, for each experimental sample set in the resource conversion value, a target propensity score matching the propensity score of each user sample in the experimental sample set is retrieved according to the nearest neighbor matching algorithm;
基于目标倾向性得分确定与实验样本集相匹配的对照样本集。A control sample set matched to the experimental sample set was determined based on the target propensity score.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:In one embodiment, when the processor executes the computer program, the processor further implements the following steps:
计算各实验样本集和实验样本集对应的对照样本集之间的标准化均值差;Calculate the standardized mean difference between each experimental sample set and the control sample set corresponding to the experimental sample set;
在标准化均值差满足预设阈值条件的情况下,将各实验样本集和实验样本集对应的对照样本集确定为目标样本集。When the standardized mean difference meets the preset threshold condition, each experimental sample set and the control sample set corresponding to the experimental sample set are determined as the target sample set.
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述各方法实施例中的步骤。In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the steps in the above method embodiments are implemented.
在一个实施例中,提供了一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现上述各方法实施例中的步骤。In one embodiment, a computer program product is provided, including a computer program, which implements the steps in the above method embodiments when executed by a processor.
需要说明的是,本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,且相关数据的收集、使用和处理需要符合相关规定。It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with relevant regulations.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请提供的各实施例中所使用的对存储器、数据库或其它介质的任何引用,均可包括非易失性存储器和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存、光存储器、高密度嵌入式非易失性存储器、阻变存储器(Resistive Random Access Memory,ReRAM)、磁变存储器(Magnetoresistive RandomAccess Memory,MRAM)、铁电存储器(Ferroelectric Random Access Memory,FRAM)、相变存储器(Phase Change Memory,PCM)、石墨烯存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器等。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。本申请提供的各实施例中所涉及的数据库可包括关系型数据库和非关系型数据库中至少一种。非关系型数据库可包括基于区块链的分布式数据库等,不限于此。本申请提供的各实施例中所涉及的处理器可为通用处理器、中央处理器、图形处理器、数字信号处理器、可编程逻辑器、基于量子计算的数据处理逻辑器、人工智能(Artificial Intelligence,AI)处理器等,不限于此。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment method can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium. When the computer program is executed, it can include the processes of the embodiments of the above-mentioned methods. Among them, any reference to the memory, database or other medium used in the embodiments provided in the present application can include at least one of non-volatile memory and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. As an illustration and not limitation, RAM can be in various forms, such as static random access memory (SRAM) or dynamic random access memory (DRAM). The database involved in each embodiment provided in this application may include at least one of a relational database and a non-relational database. Non-relational databases may include distributed databases based on blockchains, etc., but are not limited to this. The processor involved in each embodiment provided in this application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, an artificial intelligence (AI) processor, etc., but are not limited to this.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本申请记载的范围。The technical features of the above embodiments may be combined arbitrarily. To make the description concise, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this application.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请的保护范围应以所附权利要求为准。The above-described embodiments only express several implementation methods of the present application, and the descriptions thereof are relatively specific and detailed, but they cannot be understood as limiting the scope of the present application. It should be pointed out that, for a person of ordinary skill in the art, several variations and improvements can be made without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the attached claims.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410807055.7A CN118379086A (en) | 2024-06-21 | 2024-06-21 | Data prediction method, apparatus, computer device, readable storage medium, and program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410807055.7A CN118379086A (en) | 2024-06-21 | 2024-06-21 | Data prediction method, apparatus, computer device, readable storage medium, and program product |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118379086A true CN118379086A (en) | 2024-07-23 |
Family
ID=91910357
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410807055.7A Pending CN118379086A (en) | 2024-06-21 | 2024-06-21 | Data prediction method, apparatus, computer device, readable storage medium, and program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118379086A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060271407A1 (en) * | 1999-06-23 | 2006-11-30 | Rosenfeld Brian A | Using predictive models to continuously update a treatment plan for a patient in a health care location |
CN105468895A (en) * | 2006-05-02 | 2016-04-06 | 普罗透斯数字保健公司 | Patient customized therapeutic regimens |
CN112233810A (en) * | 2020-10-20 | 2021-01-15 | 武汉华大基因科技有限公司 | Treatment scheme comprehensive curative effect evaluation method based on real world clinical data |
CN118155787A (en) * | 2024-01-26 | 2024-06-07 | 云上贵州大数据产业发展有限公司 | A medical data processing method and system based on Internet big data |
-
2024
- 2024-06-21 CN CN202410807055.7A patent/CN118379086A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060271407A1 (en) * | 1999-06-23 | 2006-11-30 | Rosenfeld Brian A | Using predictive models to continuously update a treatment plan for a patient in a health care location |
CN105468895A (en) * | 2006-05-02 | 2016-04-06 | 普罗透斯数字保健公司 | Patient customized therapeutic regimens |
CN112233810A (en) * | 2020-10-20 | 2021-01-15 | 武汉华大基因科技有限公司 | Treatment scheme comprehensive curative effect evaluation method based on real world clinical data |
CN118155787A (en) * | 2024-01-26 | 2024-06-07 | 云上贵州大数据产业发展有限公司 | A medical data processing method and system based on Internet big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110163261A (en) | Unbalanced data disaggregated model training method, device, equipment and storage medium | |
CN112308623B (en) | High-quality customer loss prediction method, device and storage medium based on supervised learning | |
CN112418653A (en) | Number portability and network diver identification system and method based on machine learning algorithm | |
CN114881343B (en) | Short-term load forecasting method and device for power system based on feature selection | |
CN113762579A (en) | A model training method, device, computer storage medium and device | |
CN112700324A (en) | User loan default prediction method based on combination of Catboost and restricted Boltzmann machine | |
CN111666494A (en) | Clustering decision model generation method, clustering processing method, device, equipment and medium | |
WO2024045989A1 (en) | Graph network data set processing method and apparatus, electronic device, program, and medium | |
CN114170000A (en) | Credit card user risk category identification method, device, computer equipment and medium | |
Mani et al. | An investigation of wine quality testing using machine learning techniques | |
CN113706258B (en) | Product recommendation method, device, equipment and storage medium based on combined model | |
CN114140246A (en) | Model training method, fraud transaction identification method, device and computer equipment | |
CN112288306A (en) | Mobile application crowdsourcing test task recommendation method based on xgboost | |
CN118379086A (en) | Data prediction method, apparatus, computer device, readable storage medium, and program product | |
CN118037033A (en) | Method and device for generating personalized business scheme and computer equipment | |
CN112232945B (en) | Method and device for determining personal client credit | |
CN114663213A (en) | Loan default probability prediction method, electronic device and storage medium | |
CN110580261B (en) | An in-depth technology tracking method for high-tech companies | |
CN116778308B (en) | Object recognition method, device, computer equipment and storage medium | |
CN114693325A (en) | User public praise intelligent guarantee method and device based on neural network | |
CN115831339B (en) | Pre-prediction method and system for medical system risk management and control based on deep learning | |
Menaka et al. | Data Transformation, Modelling and Prediction of Customer Churn using Deep Learning | |
Fan et al. | A TAIEX forecasting model based on changes of keyword search volume on Google Trends | |
CN118364972A (en) | User behavior prediction method, device, computer equipment, storage medium and product | |
CN119250964A (en) | Stock selection method and system combining LSTM (least squares) and multi-graph convolution neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Country or region after: China Address after: Room 901, 9th Floor, Building 1, Zone 4, Wangjing Dongyuan, Chaoyang District, Beijing 100020 Applicant after: Beijing Apacolan Technology Group Co.,Ltd. Address before: Room 901, 9th Floor, Building 1, Zone 4, Wangjing Dongyuan, Chaoyang District, Beijing 100020 Applicant before: Beijing apoco Blue Technology Co.,Ltd. Country or region before: China |
|
CB02 | Change of applicant information |