[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN116307352A - Engineering quantity index estimation method and system based on machine learning - Google Patents

Engineering quantity index estimation method and system based on machine learning Download PDF

Info

Publication number
CN116307352A
CN116307352A CN202211380237.8A CN202211380237A CN116307352A CN 116307352 A CN116307352 A CN 116307352A CN 202211380237 A CN202211380237 A CN 202211380237A CN 116307352 A CN116307352 A CN 116307352A
Authority
CN
China
Prior art keywords
feature
engineering quantity
model
machine learning
quantity index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211380237.8A
Other languages
Chinese (zh)
Inventor
刘静
刘在田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Nuclear Huawei Engineering Design And Research Co ltd
Original Assignee
China Nuclear Huawei Engineering Design And Research Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Nuclear Huawei Engineering Design And Research Co ltd filed Critical China Nuclear Huawei Engineering Design And Research Co ltd
Priority to CN202211380237.8A priority Critical patent/CN116307352A/en
Publication of CN116307352A publication Critical patent/CN116307352A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of machine learning and engineering cost, in particular to a method and a system for estimating engineering quantity indexes based on machine learning; the method comprises the following steps: acquiring project history data from a project management system, and constructing an original data set D according to the project history data 0 Performing feature selection on an original data set by utilizing a mixed feature selection method to obtain an optimal feature subset S, building a basic regression model based on a plurality of machine learning algorithms, fully integrating the advantages of the multiple models, and building an integrated learning engineering quantity index estimation model; the invention mixes multiple feature selection methods, improves the prediction effect of the model, solves the problems of large data volatility and insensitivity of a single feature selection method to certain feature data, integrates multiple machine learning algorithms and improves the modelRobustness and accuracy of (c).

Description

Engineering quantity index estimation method and system based on machine learning
Technical Field
The invention relates to the technical field of machine learning and engineering cost, in particular to a method and a system for estimating engineering quantity indexes based on machine learning.
Background
As the real estate industry slows down, the building market also competes more and more, and the bidding period of time is shorter and shorter. The construction engineering quantity index estimation can provide important basis for budget quotation of enterprises, and whether the estimation is accurate or not can directly influence investment decisions of the enterprises. How to estimate engineering quantity indexes quickly and efficiently is particularly important to the improvement of technical level and core competitiveness of construction enterprises.
The traditional project quantity index prediction is performed by using artificial experience and project similarity matching, namely, the project quantity index of a new project is estimated by searching historical project data similar to the project profile of the project to be calculated, and a simple statistical analysis method and a linear regression method are mainly used in the prediction process.
Along with the development of big data and artificial intelligence, the prediction direction of the engineering cost is gradually developed from the traditional method to the information technology, and the engineering cost is also predicted based on Artificial Neural Network (ANN), BP neural network (BPNN) and other methods in China.
Therefore, for the current state of research, the present application CN114331221a has been filed to solve the above problems, but it is to study the estimation of the construction quantity index instead of the estimation of the price to exclude the interference of external factors. Engineering quantity index estimation has several problems; 1) The characteristics affecting the engineering quantity index estimation are numerous, the effective analysis and utilization are lacking, the existing research results mostly depend on human experience, and the data support is lacking; 2) Most of the existing engineering quantity prediction methods are based on a simple single method or model, but a strong nonlinear relation exists between engineering profile and engineering quantity indexes, so that the error of the existing research method is large, therefore, the application aims to provide an engineering quantity index estimation method based on machine learning on the basis of early research, and a plurality of feature choices are mixed to improve the prediction effect of the model, solve the problems that the data volatility is large and the single feature choice method is insensitive to certain feature data, integrate a plurality of machine learning algorithms, and improve the robustness and accuracy of the model.
Disclosure of Invention
In order to solve the problems, the invention provides a machine learning-based engineering quantity index estimation method and a machine learning-based engineering quantity index estimation system, which are mixed with various feature selections, so that the prediction effect of a model is improved, the problems that the data volatility is large and a single feature selection method is insensitive to certain feature data are solved, various machine learning algorithms are integrated, and the robustness and the accuracy of the model are improved.
In order to achieve the above purpose, the present invention provides the following technical solutions:
the first aspect of the invention: the engineering quantity index estimation method based on machine learning comprises the following steps:
(1) Acquiring project history data from a project management system, and constructing an original data set D according to the project history data 0
(2) The method for selecting the characteristics of the original data set by utilizing the mixed characteristic selection method is used for selecting the characteristics of the original data set to obtain an optimal characteristic subset S, and the specific process is as follows:
(201) Constructing a feature selection dataset D on the basis of the original dataset 1 ,D 1 ={(X ij ,y ij )},i,j=1,2,…,n,X ij Is the engineering profile difference value, y, between monomer i and monomer j ij Representing the relative error of engineering quantity indexes between the monomer i and the monomer j;
(202) Removing linear related characteristic variables based on PCA algorithm to obtain characteristic subset S 1
(203) Calculating the maximum information coefficient MIC of every two variable features in the original feature set;
(204) Removing redundant features in the original feature variables according to the threshold value to obtain a feature subset S 2 The specific process is as follows:
(2041) According to feature subsets S 2 Constructing a random forest regression model by the feature numbers in the tree and the number of decision trees;
(2042) And (3) carrying out single feature importance assessment by using a random forest regression model, wherein the importance of the jth feature is as follows:
Figure SMS_1
in the formula e i E, evaluating the error value obtained by evaluating the j-th decision tree in the random forest regression model evaluation by using the out-of-bag data ji The error value of the j decision tree is obtained after noise drying is introduced;
(2043) Sorting the importance of the features, and determining a feature screening threshold, wherein the formula of the feature screening threshold is as follows:
δ=min (M) +α, where M represents a featureSubset S 2 The feature importance sets in (a) and alpha represents threshold tolerance;
(205) Computing feature subset S using random forest algorithm 2 The importance of each feature in (a);
(206) Further screening the features according to the threshold value to obtain an optimal feature subset S;
(3) Based on a plurality of machine learning algorithms, a basic regression model is built, the advantages of a plurality of models are fully fused, and an integrated learning engineering quantity index estimation model is built;
the process for constructing the integrated learning engineering quantity index estimation model is as follows:
(301) Constructing a machine learning data set based on the optimal feature subset S obtained in the step (2), and dividing the data set into a training set and a testing set;
(302) Building a first-layer machine learning model, wherein the first-layer machine learning model comprises a BPNN model, an RFR model and a PSO-GRNN model;
(303) Training 4 basic learners respectively by adopting 4-fold cross validation, and longitudinally superposing predicted values of the 4 basic learners to obtain new features, and generating a new training set and a new testing set;
(304) And constructing a second-layer machine learning model based on the Ridge regression method, training the second-layer meta-regression model by using a new training set, and outputting a final prediction result.
The invention is further provided with: in the step (201) of the method,
Figure SMS_2
where ρ represents the engineering quantity index fluctuation threshold.
The second aspect of the invention: the engineering quantity index estimation system based on machine learning comprises an optimal feature subset acquisition unit and an engineering quantity index estimation unit, wherein:
the optimal feature subset obtaining unit is used for interfacing with the project management system and obtaining optimal feature subset data;
the engineering quantity index estimation unit is used for taking the optimal feature subset as input, calculating to obtain the engineering quantity index for use by using the constructed engineering quantity index estimation model, and the input end of the engineering quantity index estimation unit is connected with the output end of the optimal feature subset acquisition unit.
Advantageous effects
Compared with the prior art, the technical proposal provided by the invention has the following advantages that
The beneficial effects are that:
(1) The invention provides a mixed multiple feature selection method based on the particularity of engineering project index data, effectively improves the prediction effect of the model, and solves the problems of large data volatility and insensitivity of a single feature selection method to certain feature data.
(2) According to the integrated learning engineering quantity index estimation method and system, a plurality of machine learning algorithms are synthesized, and two-layer algorithm models are utilized for comprehensive analysis and prediction, so that the robustness and accuracy of the models are improved, and the engineering quantity index prediction error is verified to be within 5%, so that accurate and effective data support can be provided for engineering earlier-stage project cost estimation.
Drawings
FIG. 1 is a flow chart of a machine learning-based engineering quantity index estimation method of the present invention;
FIG. 2 is a flow chart of hybrid feature selection in the present invention;
FIG. 3 is a schematic diagram of an integrated learning engineering quantity index estimation model according to the present invention;
FIG. 4 is a schematic diagram of a model of a BPNN-based learner in accordance with the present invention;
FIG. 5 is a flow chart of a PSO-GRNN based learner model in accordance with the present invention;
FIG. 6 is a system diagram of a machine learning based engineering quantity index estimation system according to the present invention;
FIG. 7 is a comparative table of the feature selection method of the present invention;
FIG. 8 is a table comparing GRNN and PSO-GRNN models in accordance with the present invention;
FIG. 9 is a comparative table of predictive model comparisons in the present invention;
FIG. 10 is a table comparing the prediction results of the integrated learning model according to the present invention;
FIG. 11 is a table of records of relevant factors in the present invention;
FIG. 12 is a table showing engineering quantity index records in the present invention.
The reference numerals in the figures illustrate:
100. an optimal feature subset acquisition unit; 200. and an engineering quantity index estimation unit.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments, and that all other embodiments obtained by persons of ordinary skill in the art without making creative efforts based on the embodiments in the present invention are within the protection scope of the present invention.
In the description of the present invention, it should be noted that the positional or positional relationship indicated by the terms such as "upper", "lower", "inner", "outer", "top/bottom", etc. are based on the positional or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "configured to," "engaged with," "connected to," and the like are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be the communication between the two elements; the specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Examples:
as shown in fig. 1-12, the invention provides a machine learning-based engineering quantity index estimation method, which comprises the following steps:
(1) Acquiring project history data from a project management system, and constructing an original data set D according to the project history data 0
In the present embodiment, the acquired project history data, including the monomer project profile and the project quantity index, is subjected to data cleaning and data preprocessing according to the existing method, the data cleaning includes processing of the repetition value, the missing value and the abnormal value, the data preprocessing is to convert the characteristic data types, and to perform normalization processing on the converted data, and then to construct the original data set D 0
In addition, in order to fully mine and utilize the historical project data, in this embodiment, the steel bar engineering quantity index is used as a prediction object, and 28 items of total monomer profile information including the region where the project is located, the standard layer height and the like are collected as an initial feature set. As a data preprocessing method, the present embodiment mainly includes: and (3) binarizing and dumb coding qualitative characteristics (such as basic types and project areas), and carrying out dimensionless processing on the data by a min-max method.
(2) The method for selecting the characteristics of the original data set by utilizing the mixed characteristic selection method is used for selecting the characteristics of the original data set to obtain an optimal characteristic subset S, and the specific process is as follows:
(201) Constructing a feature selection dataset D on the basis of the original dataset 1 ,D 1 ={(X ij ,y ij )},i,j=1,2,…,n,X ij Is the engineering profile difference value, y, between monomer i and monomer j ij Indicating the relative error of engineering quantity index between monomer i and monomer j,
Figure SMS_3
where ρ represents an engineering quantity index fluctuation threshold value, and in this embodiment, ρ=0.05.
In an embodiment, taking into account the particularities of the engineering quantity index data (the sameThe engineering quantity index of the engineering profile monomer fluctuates within a certain range, and the characteristic selection error is large and the effect is poor by directly using the original data set. The invention constructs the feature selection data set D based on the original data set 1
(202) Removing linear related characteristic variables based on PCA algorithm to obtain characteristic subset S 1
In the embodiment, the implementation process of the PCA algorithm is not described in detail, and in the embodiment, after the PCA dimension reduction, the first 23 feature variables are taken as feature subsets S 1
(203) Calculating the maximum information coefficient MIC of every two variable features in the original feature set;
in this embodiment, the feature subset S obtained by MIC feature screening 2 The number of the characteristic variables is 18.
(204) Removing redundant features in the original feature variables according to the threshold value to obtain a feature subset S 2 The specific process is as follows:
(2041) According to feature subsets S 2 Constructing a random forest regression model by the feature numbers in the tree and the number of decision trees;
(2042) And (3) carrying out single feature importance assessment by using a random forest regression model, wherein the importance of the jth feature is as follows:
Figure SMS_4
in the formula e i E, evaluating the error value obtained by evaluating the j-th decision tree in the random forest regression model evaluation by using the out-of-bag data ji The error value of the j decision tree is obtained after noise drying is introduced;
as an embodiment, one implementation is as follows:
in data set D 1 Feature subset S 2 Based on which data subset D is constructed 2 As the input of the random forest regression model, the performance of each decision tree in the random forest model is evaluated by using the data outside the bag to obtain the error value of each decision tree, and the error value is recorded as e i I=1, 2,3, …, n, adding noise disturbance to the j variable feature while ensuring that the remaining features are unchangedCalculating the error value of each decision tree again and marking as e ji The importance of the j-th feature, i, j=1, 2,3, …, n, is:
Figure SMS_5
(2043) Sorting the importance of the features, and determining a feature screening threshold, wherein the formula of the feature screening threshold is as follows:
δ=min (M) +α, where M represents feature subset S 2 And α represents a threshold tolerance, in this embodiment, α=0.01.
(205) Computing feature subset S using random forest algorithm 2 The importance of each feature in (a);
(206) Further screening the features according to the threshold value to obtain an optimal feature subset S;
in this embodiment, the optimal feature subset s= { above/below ground, standard layer height, single layer number, earthquake-proof intensity, project location area, fire-proof level, total layer height, structure type, earthquake-proof level, building area, foundation type, civil air defense duty }, for the steel bar engineering quantity prediction problem.
To further illustrate the advantages of the present invention, the effectiveness of five feature selection methods, PCA, MIC+PCA, MIC+RF, RF+PCA, MIC+RF+PCA, were compared based on the same predictive model.
In this embodiment, it should be noted that the specific comparison method is as follows:
1) Randomly selecting project data of 50 monomers as test data;
2) For five different feature selection methods, feature factors determined by the different feature selection methods are used as input respectively, an engineering quantity prediction model is constructed based on the same prediction method (BPNN algorithm is selected in the test example), training is carried out on the model by using training data, and then engineering quantity of the test data is predicted to obtain engineering quantity prediction values under the different feature selection methods.
3) Three evaluation indexes of MSE (mean square error), MAE (mean absolute error) and R2_score (determinable coefficient) of the predicted value and the true value under different feature selection methods are calculated, and the model performance is comprehensively evaluated and compared, and the result is shown in figure 7. MSE, MAE, R2_score are general calculation methods, and detailed calculation formulas are not repeated in the present invention.
From FIG. 7, it can be seen that the smaller the index values of MSE and MAE, the higher the prediction accuracy of the model; and the closer the value of R2 score is to 1, the better the fitting effect of the model is, and the higher the accuracy is. As can be seen from the comparison result of the embodiment, when the prediction models are consistent, the MSE and MAE index values of the mixed feature selection method based on PCA+MIC+RF are obviously smaller, and R2_score is higher than that of the other four methods, which proves that the mixed feature selection method provided by the invention can obviously improve the prediction effect of the prediction model.
(3) Based on a plurality of machine learning algorithms, a basic regression model is built, the advantages of a plurality of models are fully fused, and an integrated learning engineering quantity index estimation model is built;
the process for constructing the integrated learning engineering quantity index estimation model is as follows:
(301) Constructing a machine learning data set based on the optimal feature subset S obtained in the step (2), and dividing the data set into a training set and a testing set;
(302) Building a first-layer machine learning model, wherein the first-layer machine learning model comprises three parallel basic learners, namely a BPNN model (back propagation neural network), an RFR model (random forest regression model) and a PSO-GRNN model (particle swarm-generalized regression neural network);
(303) Training 4 basic learners respectively by adopting 4-fold cross validation, and longitudinally superposing predicted values of the 4 basic learners to obtain new features, and generating a new training set and a new testing set;
(304) And constructing a second-layer machine learning model based on the Ridge regression method, training the second-layer meta-regression model by using a new training set, and outputting a final prediction result.
In the present embodiment, the machine learning data set d= { (X) i ,y i ) I=1, 2, …, n, where X i E S represents the monomer characteristics of the ith monomer, y i I monomer-representing workerA program quantity profile. And 80% of the dataset was used as training set and 20% as test set.
The method of each base learner in step (304) is as follows:
for the BPNN-based learner in this embodiment, the number of hidden layers is optimized based on grid search and cross validation by using the prior art to obtain the optimal super-parameters, where the hidden layers are three layers, the node numbers are 64, 128, 32 in sequence, and the model architecture is shown in fig. 4. The training process uses MSE as an error function, uses a gradient descent method and updates predictions based on learning rates, and finds a combination of parameters that minimizes network errors by means of the fastest gradient information.
For the RFR-based learner in this embodiment, the super parameters are optimized by using grid search and cross validation, wherein the number of basic decision trees is 200, the maximum depth of each decision tree is 50, and the RFR-based learner is constructed by using the searched optimal parameter combination based on training data.
It should be noted that, both grid search and cross validation belong to the mature parameter adjustment means, and the invention is not repeated. In this embodiment, the super parameters in the base learner are selected mainly by a combination of two methods.
For different base learners, the method comprises the following general steps:
1) Presetting several groups of base learner super-parameter combinations as candidate parameters;
2) Each set of hyper-parameter combinations is cycled through all candidate parameters and the model performance of each set of hyper-parameter combinations is evaluated based on a cross-validation approach.
Specifically, with respect to cross-validation, training data is split equally into 4 shares in this embodiment. And taking one data as a verification set and the rest 3 data as a training set each time, training and testing the model, and calculating the mean square error of the test data each time. And training for 4 times, testing for 4 times to obtain 4 times of test errors, and finally averaging the test errors to obtain the final test error of each group of super-parameter combinations.
According to the final test error of each group of super-parameter combinations, the super-parameter combination with the best performance, namely the smallest error, is selected as the optimal super-parameter combination.
In the PSO-GRNN-based learner in this embodiment, a three-layer GRNN network structure is first constructed, and optimization is performed on selection of smoothing factors in a GRNN model based on a PSO algorithm, and a specific optimization flow is shown in FIG. 5. The GRNN and PSO are mature algorithms, and the invention is not repeated, and only the optimization effect of the PSO algorithm on the GRNN network structure parameters is described. To illustrate the effectiveness of the method, the present example compares the model accuracy of GRNN and PSO-GRNN, and the results are shown in FIG. 8.
As can be seen from the comparison result of FIG. 8, compared with the basic GRNN method, the MSE and MAE index values of the PSO-GRNN method provided by the invention are smaller, and the R2_score is slightly higher, so that the effectiveness of the method for optimizing the GRNN model by using the PSO algorithm provided by the invention is proved.
Further, in this embodiment, the model accuracy of the ensemble learning model and BPNN, SVR, RFR, PSO-GRNN according to the present invention are compared, and the result is shown in fig. 9.
As can be seen from fig. 9, the performance of the integrated learning model according to the present invention is superior to that of the single base learner model in three evaluation indexes, namely MSE, MAE and r2_score.
Based on the test data set, the model prediction result is verified, part of the test result is shown in fig. 10, and the experimental result proves that the method disclosed by the invention can stably and accurately predict the content of the index, the prediction relative error is less than 5%, and the accuracy requirement of the early-stage project estimation is completely met.
As shown in fig. 6, the present invention further provides an engineering quantity index estimation system based on machine learning, which includes an optimal feature subset obtaining unit 100 and an engineering quantity index estimation unit 200, wherein:
the optimal feature subset obtaining unit 100 is configured to interface with the project management system and obtain optimal feature subset data;
the engineering quantity index estimation unit 200 is configured to take the optimal feature subset as input, calculate to obtain a corresponding engineering quantity index by using the constructed engineering quantity index estimation model, and an input end of the engineering quantity index estimation unit 200 is connected with an output end of the optimal feature subset obtaining unit 100.
The workflow of the method is as follows: the method is characterized in that a residential building in Jiangsu Changzhou is used as a specific implementation case for analysis, project characteristic factors (overground/underground, standard floor height, single floor number, earthquake fortification intensity, region where the project is located, fire resistance level, total floor height, structure type, earthquake resistance level, building area, foundation type and civil air defense ratio) of the residential building are input in the initial stage of the project, and specific relevant factors are shown in fig. 11. Through the input values, the engineering quantity index estimation model is utilized to obtain the predicted engineering quantity index, and the predicted engineering quantity index is compared with the actual engineering quantity index, wherein the prediction error of the engineering quantity index is within 5 percent (as shown in figure 12), so that accurate and effective data support is provided for engineering earlier project cost estimation.
The engineering quantity index estimation system is used for estimating engineering quantity indexes.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (3)

1. The engineering quantity index estimation method based on machine learning is characterized by comprising the following steps of:
(1) Acquiring project history data from a project management system, and constructing an original data set D according to the project history data 0
(2) The method for selecting the characteristics of the original data set by utilizing the mixed characteristic selection method is used for selecting the characteristics of the original data set to obtain an optimal characteristic subset S, and the specific process is as follows:
(201) Constructing a feature selection dataset D on the basis of the original dataset 1 ,D 1 ={(X ij ,y ij )},i,j=1,2,…,n,X ij Is the engineering profile difference value, y, between monomer i and monomer j ij Representing the relative error of engineering quantity indexes between the monomer i and the monomer j;
(202) Removing linear related characteristic variables based on PCA algorithm to obtain characteristic subset S 1
(203) Calculating the maximum information coefficient MIC of every two variable features in the original feature set;
(204) Removing redundant features in the original feature variables according to the threshold value to obtain a feature subset S 2 The specific process is as follows:
(2041) According to feature subsets S 2 Constructing a random forest regression model by the feature numbers in the tree and the number of decision trees;
(2042) And (3) carrying out single feature importance assessment by using a random forest regression model, wherein the importance of the jth feature is as follows:
Figure QLYQS_1
wherein e is i E, evaluating the error value obtained by evaluating the j-th decision tree in the random forest regression model evaluation by using the out-of-bag data ji The error value of the j decision tree is obtained after noise interference is introduced;
(2043) Sorting the importance of the features, and determining a feature screening threshold, wherein the formula of the feature screening threshold is as follows: δ=min (M) +α, where M represents feature subset S 2 The feature importance sets in (a) and alpha represents threshold tolerance;
(205) Computing feature subset S using random forest algorithm 2 The importance of each feature in (a);
(206) Further screening the features according to the threshold value to obtain an optimal feature subset S; (3) Based on a plurality of machine learning algorithms, a basic regression model is built, the advantages of a plurality of models are fully fused, and an integrated learning engineering quantity index estimation model is built; the process for constructing the integrated learning engineering quantity index estimation model is as follows:
(301) Constructing a machine learning data set based on the optimal feature subset S obtained in the step (2), and dividing the data set into a training set and a testing set;
(302) Building a first-layer machine learning model, wherein the first-layer machine learning model comprises a BPNN model, an RFR model and a PSO-GRNN model;
(303) Training 4 basic learners respectively by adopting 4-fold cross validation, and longitudinally superposing predicted values of the 4 basic learners to obtain new features, and generating a new training set and a new testing set;
(304) And constructing a second-layer machine learning model based on the Ridge regression method, training the second-layer meta-regression model by using a new training set, and outputting a final prediction result.
2. The method for estimating an engineering quantity index based on machine learning according to claim 1, wherein, in step (201),
Figure QLYQS_2
where ρ represents the engineering quantity index fluctuation threshold.
3. An engineering quantity index estimation system based on machine learning, characterized by comprising an optimal feature subset acquisition unit (100) and an engineering quantity index estimation unit (200), wherein:
the optimal feature subset obtaining unit (100) is used for interfacing with the project management system and obtaining optimal feature subset data; the engineering quantity index estimation unit (200) is used for taking the optimal feature subset as input, calculating to obtain the engineering quantity index for use by using the constructed engineering quantity index estimation model, and the input end of the engineering quantity index estimation unit (200) is connected with the output end of the optimal feature subset acquisition unit (100).
CN202211380237.8A 2022-11-05 2022-11-05 Engineering quantity index estimation method and system based on machine learning Pending CN116307352A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211380237.8A CN116307352A (en) 2022-11-05 2022-11-05 Engineering quantity index estimation method and system based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211380237.8A CN116307352A (en) 2022-11-05 2022-11-05 Engineering quantity index estimation method and system based on machine learning

Publications (1)

Publication Number Publication Date
CN116307352A true CN116307352A (en) 2023-06-23

Family

ID=86776720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211380237.8A Pending CN116307352A (en) 2022-11-05 2022-11-05 Engineering quantity index estimation method and system based on machine learning

Country Status (1)

Country Link
CN (1) CN116307352A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117010274A (en) * 2023-07-11 2023-11-07 中国地质科学院水文地质环境地质研究所 Intelligent early warning method for harmful elements in underground water based on integrated incremental learning
CN117935966A (en) * 2024-01-16 2024-04-26 重庆科技大学 Deep salty water CO based on machine learning2Solubility prediction method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117010274A (en) * 2023-07-11 2023-11-07 中国地质科学院水文地质环境地质研究所 Intelligent early warning method for harmful elements in underground water based on integrated incremental learning
CN117010274B (en) * 2023-07-11 2024-05-10 中国地质科学院水文地质环境地质研究所 Intelligent early warning method for harmful elements in underground water based on integrated incremental learning
CN117935966A (en) * 2024-01-16 2024-04-26 重庆科技大学 Deep salty water CO based on machine learning2Solubility prediction method
CN117935966B (en) * 2024-01-16 2024-10-25 重庆科技大学 Deep salty water CO based on machine learning2Solubility prediction method

Similar Documents

Publication Publication Date Title
CN111639237B (en) Electric power communication network risk assessment system based on clustering and association rule mining
WO2023142424A1 (en) Power financial service risk control method and system based on gru-lstm neural network
CN105548764B (en) A kind of Fault Diagnosis for Electrical Equipment method
CN105469196A (en) Comprehensive evaluation method and comprehensive evaluation system for evaluating mine construction project process
CN116307352A (en) Engineering quantity index estimation method and system based on machine learning
CN105243255A (en) Evaluation method for soft foundation treatment scheme
CN114118588B (en) Method for predicting peak-to-peak power failure in summer based on game feature extraction under clustering undersampling
CN110659814A (en) Power grid operation risk evaluation method and system based on entropy weight method
CN113627735B (en) Early warning method and system for engineering construction project security risk
CN114444910A (en) Electric power Internet of things-oriented edge network system health degree evaluation method
CN115099450A (en) Family carbon emission monitoring and accounting platform based on fusion model
CN117371207A (en) Extra-high voltage converter valve state evaluation method, medium and system
CN113706328A (en) Intelligent manufacturing capability maturity evaluation method based on FASSA-BP algorithm
CN118246744A (en) Risk assessment method and system for construction site of extra-long tunnel
CN114548494B (en) Visual cost data prediction intelligent analysis system
CN117934035B (en) Method, device and storage medium for predicting construction cost of building construction
CN113505818A (en) Aluminum melting furnace energy consumption abnormity diagnosis method, system and equipment with improved decision tree algorithm
CN113379326A (en) Power grid disaster emergency drilling management system based on deep neural network
CN109635008B (en) Equipment fault detection method based on machine learning
CN117743803A (en) Workload perception instant defect prediction method based on evolutionary feature construction
CN118336678A (en) Electric vehicle charging station medium-term load prediction method based on machine learning
CN111680268A (en) Multi-granularity coal mine gas risk prediction method based on cloud model
CN117114501A (en) Bridge and tunnel health state monitoring method based on fuzzy theory
CN114118688A (en) Power grid engineering cost risk early warning method based on sequence relation analysis
CN114444925A (en) Method for evaluating safety performance management index of controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination