Leveraging Machine Learning for Sophisticated Rental Value Predictions: A Case Study from Munich, Germany
<p>The shape of the target.</p> "> Figure 2
<p>Feature importance in two sets.</p> "> Figure 3
<p>Experiment design map.</p> "> Figure 4
<p>Comparison of model performance across feature set 1.</p> "> Figure 5
<p>Comparison of model performance across feature set 2.</p> ">
Abstract
:1. Introduction
- A variety of ML models were carefully selected for this regression task, including LR, random forest regression (RFR), gradient boosting regressors (GBRs), and XGB regressors.
- Two neural networks, specifically a fully connected neural network (FNN) and a PNN, were evaluated to assess the effectiveness of DL methods in predicting rental values.
- The PyTorch framework was applied to individual ML models, including those stacked with RFR, GBR, XGB, and LR, to enhance prediction accuracy.
- A thorough assessment of each model was conducted using metrics such as RMSE and the R2 score.
- To verify the adequacy and reliability of the PyTorch framework, two different feature sets chosen by the CatBoost and PSO algorithms were used.
- This research aimed to enhance the effectiveness of AI techniques within the housing industry, bridging the gap between computer science and property investment, and expanding the scope of current research on the German property market.
2. Related Work
2.1. Existing Techniques for Predicting the Housing Market
2.2. Methods of Feature Selection for Predicting Housing Prices
2.3. Justification for Model Selection and Performance Metric
- ▪
- LR is a widely used approach for analysing relationships between variables and the target in many regression contexts [31]. This statement suggests that alterations in the independent components result in a proportional modification in the dependent variable in a linear correlation [32]. This technique may be used for prediction and estimate assistance. The use of this algorithm in many studies highlights its significance as a core methodology in the field of housing price prediction research. The research of [14,15] employed LR as a benchmark to evaluate the effectiveness of various ML models in predicting housing prices. In addition, Sakri et al. & Disha et al. [16,19] implemented LR as a constituent of an algorithm to approximate house values in the regional real estate industry, based on the assumption that there is a direct relationship between the predictor variables and the housing prices. These studies collectively illustrate the enduring significance of LR in the field of forecasting property value in the real estate sector. It provides a basic and easily understood mechanism, making this algorithm highly valuable for comparing baselines in the regression task.
- ▪
- RFR is an ML model that utilises many regression trees, each having many hierarchically organised criteria [33]. Applying ensemble learning with a single decision tree improves the predictive capability to anticipate the impact [34]. This approach demonstrates high efficacy in handling a substantial quantity of variables and offers several advantages, such as exceptional precision and efficiency, as well as efficiently addressing the issue of overfitting in high-dimensional data without requiring feature selection, regardless of whether the data are numerical or categorical [35]. Consequently, the utilisation of the RF algorithm in forecasting home values has emerged as a prevalent and significant subject in several research endeavours, all of which examine its effectiveness and adaptability. Ming et al. [12] applied RF to assess the Chengdu housing market and obtained an R2 score of 0.83, indicating that it performed on par with the top models chosen for their research. In addition, Xu [15] successfully incorporated this algorithm in a comparison analysis employing the Boston housing dataset, resulting in an impressive R2 score of 0.8329. This outcome demonstrated its exceptional capacity to appropriately evaluate pricing. In the research conducted by [14], RF was deployed for the purpose of prediction due to its ability to maintain the performance of decision trees while minimising the risk of overfitting. This strategic choice emphasises the equilibrium between complexity and abstraction in a model. By incorporating PSO feature selection approaches, Sakri et al. [16] successfully integrated RF and achieved notable improvements in accuracy. This may be attributed to its capability to efficiently manage many input variables. According to the study by [19], RF performed exceptionally well in predicting the values of residential properties in Bangalore when compared to other ML models and achieved a remarkable R2 score of 0.8512, which was the second highest among all the models evaluated, highlighting the robust forecasting ability of RF. Additionally, Yang et al. [18] used this algorithm as the main part of a stacking ensemble model, combining it with other methods to improve prediction results because it can handle data in many dimensions. All the results illustrated above show that RF has a diverse array of applications in predicting home values and can effectively contribute to both individual and integrated ensemble methods. This highlights its enduring importance in different housing markets and research approaches; therefore, RF was chosen as one of the models in this study.
- ▪
- GBR: this is an ensemble approach that constructs models sequentially to rectify the mistakes made by the previous models. It is frequently successful for sophisticated datasets with complex structures, especially this dataset.
- ▪
- XGBoost, known as Extreme Gradient Boosting, improves its predictive accuracy by using several tree models [36]. This strategy mitigates overfitting problems by incorporating a regularisation term into the model framework, which penalises complicated models and minimises the sensitivity of predictions to individual data points [37]. The tree-based architecture of XGBoost demonstrates its advantage in terms of robustness against outliers and prevention of overfitting [38]. Thus, XGBoost is an exceptionally efficient method for resolving regression tasks due to its incorporation of regular terms and second-order derivatives, resulting in the enhanced learning and processing speed of the algorithm [39,40]. The exceptional efficacy and versatility of this model have garnered notoriety in the current body of work on house price prediction employing ML techniques. The research of [12] demonstrated the effectiveness of XGBoost in analysing the Chengdu housing market, suggesting that this approach has high accuracy and a lower MSE with broader applications. In addition, XGBoost was employed by [15] as one of the models in a comparative comparison of ML methods, applying the Boston housing dataset. Despite achieving the highest R2 value of 0.8776, the XGBoost algorithm proved its competitiveness by achieving comparable performance to the ANN, with an R2 score of 0.8495, resulting in a difference of 0.0281 compared to the performance of the ANN with its R2 score of 0.8776. XGBoost was deployed by [19] in practical use among the seven selected models. The result showed it has the most effectiveness in achieving accuracy rates of 0.9110 and 0.6465 on the flat and independent housing costs estimation. Additionally, Sheng et al. [17] employed a novel technique by integrating XGBoost with PSO to significantly improve the accuracy of the model on the Ames housing dataset. The PSO-XGBoost hybrid method outperformed other individual models and demonstrated the benefits of integrating XGBoost with optimisation techniques to capture nonlinear data patterns, leading to improved predictive reliability and precision at 0.9887. In a similar vein, Yang et al. [18] disrupted the traditional method of using a single algorithm by inventing a novel ensemble learning methodology called D-Stacking, which is based on a diversity approach. The method involved adding XGBoost to a complicated prediction framework. When validated, the results of the model’s performance in the Chinese and American housing datasets were better than standalone models. The collaborative efforts demonstrate the robust and durable performance of XGBoost in the domain of an AI-based approach for property price prediction, illustrating its potential as an independent solution when combined with other methods such as feature selection optimisations, hybrid models, and ensemble techniques. These integrations establish a solid foundation for the reference of methodology design in the experiment section, strengthening the significance of applying XGBoost in advancing the analysis of the real estate market.
- ▪
- An FNN, also referred to as a multilayer perceptron, is the most fundamental type of DL network, typically comprised of a layer for input, multiple hidden units, and an output layer [41]. Every neuron in the input layer obtains information, while the weighted total of data without bias is activated in the hidden layer; the preceding and following layers are connected to each other with the bias-weighted sum of the outcomes [42,43]. The activations are calculated at every layer until they reach the output ones, while the activated score creates the predicted result [8]. FNNs can execute regression tasks and learn intricate characteristics of the supplied data, as well as fit most functions and have strong nonlinear fitting capabilities with a simple principle and feasible implementation. Thus, the task of forecasting rental value in the German housing sector could be completed by its application [43]. A model that has the capability to represent intricate connections between input and output variables. An FNN model is well suited for collecting complex patterns in the housing market data due to the capacity to learn nonlinear representations.
- ▪
- PNN: The neural network (NN) models based on PyTorch can be defined, trained, and inferred by applying Python language with a Tensor type of data [44]. The process is more rapid and effective when PyTorch is used. Moreover, it is achievable to optimise the efficiency of DL models while maintaining complete flexibility throughout the model development and execution procedures [45].
- ▪
- Stacked models: this is a technique that involves merging the predictions of multiple models to enhance overall performance and accuracy. This study intends to combine PNN with RFR, GBR, XGB, and LR to obtain superior accuracy compared to using a standalone model.
- ▪
- The R2 score evaluates the effectiveness of regression models by displaying the variance proportion of the dependent variable, which together establishes the independent factor [46]. The range of values for R2 is 0 to 1. A score of 0 indicates that the model explains no variability around the mean of the response variable, while a score of 1 indicates that all the variability around the mean is explained by the response variable [47]. A higher R2 score denotes the level of quality of the trained model [48].
- ▪
3. Dataset
- The Munich property market includes 23 distinct parameters for each residential unit, including living space size, rent, and number of rooms.
- The dataset contains a total of 4369 records, of which 20% are specifically designated for testing purposes.
- The data comprises four types: objects, floats, integers, and Booleans.
- ▪
- The data was collected on a specific fixed date, emphasising the importance of frequent updates to maintain its reliability over time.
- ▪
- The presence of duplicated and irrelevant entries may compromise the quality and accuracy of the models.
- ▪
- An uneven distribution of labels may lead to biased outcomes.
- ▪
- The dataset typically includes outliers or extreme observations, which are common in the real estate market. If these observations are not handled properly, they may affect the results of the models.
4. Experiment
4.1. Experiment Design
4.2. Model Training and Evaluation
- Data preparation
- ▪
- Data cleaning: To improve the quality of the data, unnecessary observations, outliers, and inconsistencies were eliminated. K-nearest neighbours (KNNs) imputations was used to address missing numerical values, while missing categorical values were filled in using the fill-in approach.
- ▪
- Feature engineering: Building predictive models involves a critical stage called feature engineering, especially in the real estate industry, where the quality of features can significantly impact model performance. Two feature engineering approaches were used in this case. The One-hot encoding technique transforms categorical variables like “newlyConst”, “balcony”, “Kitchen”, “cellar”, “lift”, and “garden” into numerical values for use in a ML model. Normalisation was applied to adjust features such as rental price, space, and range, ensuring these factors are on a comparable scale.
- ▪
- Feature selection: Recent research has highlighted the application of hybrid models and meta-heuristic algorithms, such as PSO and CatBoost. These two methods were applied to choose the most pertinent characteristics. While CatBoost reduced dimensionality by converting categorical data into TSs, PSO optimised feature selection using an objective function.
- Model architecture
- ▪
- A variety of models were investigated based on their demonstrated performance in comparable predicting tasks and their capacity to manage intricate connections within the data. The algorithms used encompassed LR, tree-based algorithms, neural networks, and stacked models. In terms of neural network structure, regularisation techniques such as dropout were applied to both the PNNs and FNNs to prevent overfitting and provide a fair comparison between these two algorithms. Dropout regularisation was used in the hidden layers with ReLU activation for both models. Moreover, both FNN and PNN models shared a similar architecture, consisting of two hidden layers with 256 neurons, respectively, and ReLU activation functions in both layers to ensure a fair comparison. The only difference was the framework used (TensorFlow for FNN and PyTorch for PNN); however, the overall model structure and hyperparameters were consistent across both frameworks. This provides clarity on the use of frameworks and ensures a direct comparison between models built with TensorFlow and those built with PyTorch.
- Training process
- ▪
- Training–validation split: An 80:20 ratio was used to divide the dataset into training and validation sets. This ratio ensures that there is enough data for both learning and evaluation the predicted accuracy of the models on new instances, with the goal of reducing overfitting and enhancing generalisation. The training set contained 3495 records, while the validation set included 874 items after the split.
- ▪
- Hyperparameter tuning: Key hyperparameters were methodically optimised across multiple rounds to improve model performance. These hyperparameters included the learning rate, batch size, number of layers, and number of neuron per layer. The study assessed learning rates between 0.00001 and 0.1, examined batch sizes ranging from 8 to 152, modified the number of layers from one to four, and varied the number of neurons per layer from 8 to 520. The optimal configuration was found to be a learning rate of 0.001, a batch size of 64, two layers, and 256 neurons per layer.
- ▪
- Epochs and iterations: A range of epochs, from 10 to 4000, was used to train the models. The models were continuously updated based on the error resulting from the difference between the actual and predicted values for each epoch. The best-performing epoch was found at 410 epochs.
- ▪
- Loss function and optimisation: The Adam optimiser was utilised to minimise the loss, with MSE serving as the loss function. The ReLU activation function was employed in the neural networks, aiding the ability of the model to learn intricate patterns, maintain gradients, and reduce the vanishing gradient issue. This approach is computationally inexpensive, facilitating quicker training and more efficient learning.
- ▪
- Regularisation: A regularisation approach, such as dropout, was applied in the PNN to address overfitting issues by randomly disabling a subset of neurons during training. This implies that a portion of neurons is temporarily eliminated from the network, including their connections, to compel the network to develop more robust features from the data that are not dependent on any specific group of neurons. This method enhances the ability of the model to generalise to new data by simplifying its complexity and avoiding over-reliance on certain features or patterns from the training set.
- Post-training analysis
- ▪
- Model comparison: To determine which models performed the best, all models were compared after training. The models stacked with PyTorch increased the accuracy of the lowest-performing models.
4.3. Methodology of Stacking Models
4.4. Predicted vs. Actual Plot
5. Investigation Assessment
- ▪
- The PyTorch framework is highly robust and adaptable for all ML algorithms, even with different feature sets. With an average accuracy level of 90%, the statistical significance test results for the R2 score and RMSE indicate that PyTorch demonstrates minimal sensitivity to feature changes. However, standalone ML models, such as RF, XGB, GBR, and LR, are more affected by changes in features, while neural networks remain unaffected.
- ▪
- Among the models, XGB model shows the most significant variability across feature sets, resulting in greater performance differences.
- ▪
- Of the neural network models, PNNs exhibit superior accuracy on both feature sets, achieving higher accuracy than FNNs, as verified by statistically substantial improvements in both R2 and RMSE scores. These DL models employ numerous layers of nonlinear transformations to capture intricate characteristics in the data, leading to an excellent R2 score.
6. Summary and Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhantileuov, E.; Smaiyl, A.; Aibatbek, A.; Kassymkhanov, S. A case study of machine learning comparisons for predicting apartment prices in astana. In Proceedings of the 2023 IEEE International Conference on Smart Information Systems and Technologies (SIST), Astana, Kazakhstan, 4–6 May 2023; pp. 305–309. [Google Scholar]
- Khandaskar, S.; Panjwani, C.; Patil, V.; Fernandes, D.; Bajaj, P. House and rent price prediction system using regression. In Proceedings of the 2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS), Coimbatore, India, 14–16 June 2023; pp. 1733–1739. [Google Scholar]
- Kindermann, F.; Le Blanc, J.; Piazzesi, M.; Schneider, M. Learning about Housing Cost: Survey Evidence from the German House Price Boom; Technical report; National Bureau of Economic Research: Cambridge, MA, USA, 2021. [Google Scholar]
- Truong, Q.; Nguyen, M.; Dang, H.; Mei, B. Housing price prediction via improved machine learning techniques. Procedia Comput. Sci. 2020, 174, 433–442. [Google Scholar] [CrossRef]
- Yoshida, T.; Murakami, D.; Seya, H. Spatial prediction of apartment rent using regression-based and machine learning-based approaches with a large dataset. J. Real Estate Financ. Econ. 2022, 69, 1–28. [Google Scholar] [CrossRef]
- Sharma, S.; Arora, D.; Shankar, G.; Sharma, P.; Motwani, V. House price prediction using machine learning algorithm. In Proceedings of the 2023 7th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 23–25 February 2023; pp. 982–986. [Google Scholar]
- Cekic, M.; Korkmaz, K.N.; Müküs, H.; Hameed, A.A.; Jamil, A.; Soleimani, F. Artificial intelligence approach for modeling house price prediction. In Proceedings of the 2022 2nd International Conference on Computing and Machine Intelligence (ICMI), Istanbul, Turkey, 15–16 July 2022; pp. 1–5. [Google Scholar]
- Samek, W.; Montavon, G.; Lapuschkin, S.; Anders, C.J.; Müller, K.-R. Explaining deep neural networks and beyond: A review of methods and applications. Proc. IEEE 2021, 109, 247–278. [Google Scholar] [CrossRef]
- d’Errico, A.; Michalski, N.; Brainard, J.; Manz, K.M.; Manz, K.; Schwettmann, L.; Mansmann, U.; Maier, W. World Health Day 2022: Impact of COVID-19 on Health and Socioeconomic Inequities; Frontiers Media SA: Munich, Germany, 2023; p. 42. [Google Scholar]
- Zhan, C.; Wu, Z.; Liu, Y.; Xie, Z.; Chen, W. Housing prices prediction with deep learning: An application for the real estate market in taiwan. In Proceedings of the 2020 IEEE 18th International Conference on Industrial Informatics (INDIN), Warwick, UK, 20–23 July 2020; Volume 1, pp. 719–724. [Google Scholar]
- Pai, P.-F.; Wang, W.-C. Using machine learning models and actual transaction data for predicting real estate prices. Appl. Sci. 2020, 10, 5832. [Google Scholar] [CrossRef]
- Ming, Y.; Zhang, J.; Qi, J.; Liao, T.; Wang, M.; Zhang, L. Prediction and analysis of chengdu housing rent based on xgboost algorithm. In Proceedings of the 3rd International Conference on Big Data Technologies, New York, NY, USA, 18–20 September 2020; pp. 1–5. [Google Scholar]
- Lv, C.; Liu, Y.; Wang, L. Analysis and forecast of influencing factors on house prices based on machine learning. In Proceedings of the 2022 Global Conference on Robotics, Artificial Intelligence and Information Technology (GCRAIT), Chicago, IL, USA, 30–31 July 2022; pp. 97–101. [Google Scholar]
- Wang, Y. The comparison of six prediction models in machine learning: Based on the house prices prediction. In Proceedings of the 2022 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE), Guangzhou, China, 5–7 August 2022; pp. 446–451. [Google Scholar]
- Xu, J. A novel deep neural network-based method for house price prediction. In Proceedings of the 2021 International Conference of Social Computing and Digital Economy (ICSCDE), Chongqing, China, 28–29 August 2021; pp. 12–16. [Google Scholar]
- Sakri, S.B.; Ali, Z. Analysis of the dimensionality issues in house price forecasting modeling. In Proceedings of the 2022 Fifth International Conference of Women in Data Science at Prince Sultan University (WiDS PSU), Riyadh, Saudi Arabia, 28–29 March 2022; pp. 13–19. [Google Scholar]
- Sheng, C.; Yu, H. An optimized prediction algorithm based on xgboost. In Proceedings of the 2022 International Conference on Networking and Network Applications (NaNA), Urumqi, China, 3–5 December 2022; pp. 1–6. [Google Scholar]
- Yang, Z.; Zhu, X.; Zhang, Y.; Nie, P.; Liu, X. A housing price prediction method based on stacking ensemble learning optimization method. In Proceedings of the 2023 IEEE 10th International Conference on Cyber Security and Cloud Computing (CSCloud)/2023 IEEE 9th International Conference on Edge Computing and Scalable Cloud (EdgeCom), Xiangtan, China, 1–3 July 2023; pp. 96–101. [Google Scholar]
- Disha, U.B.; Saxena, S. Real estate property price estimator using machine learning. In Proceedings of the 2023 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES), Greater Noida, India, 28–30 April 2023; pp. 895–900. [Google Scholar]
- Almohimeed, A.; Saad, R.M.A.; Mostafa, S.; El-Rashidy, N.; Farag, S.; Gaballah, A.; Elaziz, M.A.; El-Sappagh, S.; Saleh, H. Explainable artificial intelligence of multi-level stacking ensemble for detection of alzheimer’s disease based on particle swarm optimization and the sub-scores of cognitive biomarkers. IEEE Access 2023, 11, 123173–123193. [Google Scholar] [CrossRef]
- Gad, A.G. Particle swarm optimization algorithm and its applications: A systematic review. Arch. Comput. Methods Eng. 2022, 29, 2531–2561. [Google Scholar] [CrossRef]
- Joodaki, N.Z.; Bagher Dowlatshahi, M.; Joodaki, M. A novel ensemble feature selection method through Type I fuzzy. In Proceedings of the 2022 9th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), Bam, Iran, 2–4 March 2022; pp. 1–6. [Google Scholar]
- Hassan, R.; Hamid, O.; Brahim, E. Induction motor current control with torque ripples optimization combining a neural predictive current and particle swarm optimization. In Proceedings of the 2023 9th International Conference on Control, Decision and Information Technologies (CoDIT), Rome, Italy, 3–6 July 2023; pp. 2067–2072. [Google Scholar]
- Shami, T.M.; El-Saleh, A.A.; Alswaitti, M.; Al-Tashi, Q.; Summakieh, M.A.; Mirjalili, S. Particle swarm optimization: A comprehensive survey. IEEE Access 2022, 10, 10031–10061. [Google Scholar] [CrossRef]
- Wu, X.; Li, C.; Jiang, J.; Sun, A.; Zhang, Q. Distribution network reconfiguration based on improved particle swarm optimization algorithm. In Proceedings of the 2023 IEEE 7th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 15–17 September 2023; Volume 7, pp. 971–975. [Google Scholar]
- El Hammedi, H.; Chrouta, J.; Khaterchi, H.; Zaafouri, A. Comparative study of mppt algorithms: PO, INC, and PSO for PV system optimization. In Proceedings of the 2023 9th International Conference on Control, Decision and Information Technologies (CoDIT), Rome, Italy, 3–6 July 2023; pp. 2683–2688. [Google Scholar]
- Wang, Z.; Ren, H.; Lu, R.; Huang, L. Stacking based lightgbm-catboost-randomforest algorithm and its application in big data modeling. In Proceedings of the 2022 4th International Conference on Data-driven Optimization of Complex Systems (DOCS), Chengdu, China, 28–30 October 2022; pp. 1–6. [Google Scholar]
- Zhong, C.; Geng, F.; Zhang, X.; Zhang, Z.; Wu, Z.; Jiang, Y. Shear wave velocity prediction of carbonate reservoirs based on catboost. In Proceedings of the 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 28–31 May 2021; pp. 622–626. [Google Scholar]
- Ye, X.; Li, Y.; Feng, X.; Heng, C. A crypto market forecasting method based on catboost model and bigdata. In Proceedings of the 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 15–17 April 2022; pp. 686–689. [Google Scholar]
- Zhang, C.; Chen, Z.; Zhou, J. Research on short-term load forecasting using k-means clustering and catboost integrating time series features. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 6099–6104. [Google Scholar]
- Chen, Y.; Xue, R.; Zhang, Y. House price prediction based on machine learning and deep learning methods. In Proceedings of the 2021 International Conference on Electronic Information Engineering and Computer Science (EIECS), Changchun, China, 23–26 September 2021; pp. 699–702. [Google Scholar]
- Kalaivani, K.; Kanimozhiselvi, C.; Bilal, Z.M.; Sukesh, G.; Yokeswaran, S. A comparative study of regression algorithms on house sales price prediction. In Proceedings of the 2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS), Trichy, India, 23–25 August 2023; pp. 826–831. [Google Scholar]
- Alshammari, T. Evaluating machine learning algorithms for predicting house prices in saudi arabia. In Proceedings of the 2023 International Conference on Smart Computing and Application (ICSCA), Hail, Saudi Arabia, 5–6 February 2023; pp. 1–5. [Google Scholar]
- Zhou, Q.; Zhu, P.; Huang, Z.; Zhao, Q. Pest bird density forecast of transmission lines by random forest regression model and line transect method. In Proceedings of the 2020 7th International Conference on Information, Cybernetics, and Computational Social Systems (ICCSS), Guangzhou, China, 13–15 November 2020; pp. 527–530. [Google Scholar]
- Kurniawati, N.; Novita Nurmala Putri, D.; Kurnia Ningsih, Y. Random forest regression for predicting metamaterial antenna parameters. In Proceedings of the 2020 2nd International Conference on Industrial Electrical and Electronics (ICIEE), Lombok, Indonesia, 20–21 October 2020; pp. 174–178. [Google Scholar]
- Zhu, R.; Yang, Y.; Chen, J. Xgboost and cnn-lstm hybrid model with attention-based stock prediction. In Proceedings of the 2023 IEEE 3rd International Conference on Electronic Technology, Communication and Information (ICETCI), Changchun, China, 26–28 May 2023; pp. 359–365. [Google Scholar]
- El Houda, B.N.; Lakhdar, L.; Abdallah, M. Time series analysis of household electric consumption with xgboost model. In Proceedings of the 2022 4th International Conference on Pattern Analysis and Intelligent Systems (PAIS), Oum El Bouaghi, Algeria, 12–13 October 2022; pp. 1–6. [Google Scholar]
- Zhang, X.; Yan, C.; Gao, C.; Malin, B.A.; Chen, Y. Predicting missing values in medical data via xgboost regression. J. Healthc. Inform. Res. 2020, 4, 383–394. [Google Scholar] [CrossRef] [PubMed]
- Ge, J.; Zhao, L.; Yu, Z.; Liu, H.; Zhang, L.; Gong, X.; Sun, H. Prediction of greenhouse tomato crop evapotranspiration using xgboost machine learning model. Plants 2022, 11, 1923. [Google Scholar] [CrossRef] [PubMed]
- Qiu, Y.; Zhou, J.; Khandelwal, M.; Yang, H.; Yang, P.; Li, C. Performance evaluation of hybrid woa-xgboost, gwo-xgboost and bo-xgboost models to predict blast-induced ground vibration. Eng. Comput. 2021, 38, 4145–4162. [Google Scholar] [CrossRef]
- Wang, W.; Dong, W.; Yu, T.; Du, Y. Research on prs/irs time registration based on fully connected neural network. In Proceedings of the 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 11–13 December 2020; Volume 9, pp. 942–947. [Google Scholar]
- Jia, B.; Zhang, Y. Spectrum analysis for fully connected neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 10091–10104. [Google Scholar] [CrossRef] [PubMed]
- Li, Q.; Zhai, Z.; Li, Q.; Wu, L.; Bao, L.; Sun, H. Improved bathymetry in the south china sea from multisource gravity field elements using fully connected neural network. J. Mar. Sci. Eng. 2023, 11, 1345. [Google Scholar] [CrossRef]
- Lee, K.H.; Park, J.; Kim, S.-T.; Kwak, J.Y.; Cho, C.S. Design of nnef-pytorch neural network model converter. In Proceedings of the 2021 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 20–22 October 2021; pp. 1710–1712. [Google Scholar]
- Sawarkar, K. Deep Learning with PyTorch Lightning: Swiftly Build High-Performance Artificial Intelligence (AI) Models Using Python; Packt Publishing Ltd.: Birmingham, UK, 2022. [Google Scholar]
- Rustam, F.; Reshi, A.A.; Mehmood, A.; Ullah, S.; On, B.-W.; Aslam, W.; Choi, G.S. Covid-19 future forecasting using supervised machine learning models. IEEE Access 2020, 8, 101489–101499. [Google Scholar] [CrossRef]
- Sahoo, A.; Ghose, D.K. Imputation of missing precipitation data using KNN, SOM, RF, and FNN. Soft Comput. 2022, 26, 5919–5936. [Google Scholar] [CrossRef]
- Almaslukh, B. A gradient boosting method for effective prediction of housing prices in complex real estate systems. In Proceedings of the 2020 International Conference on Technologies and Applications of Artificial Intelligence (TAAI), Taipei, Taiwan, 3–5 December 2020; pp. 217–222. [Google Scholar]
- Guang, W.; Zubao, S. Research on the application of integrated rg-lstm model in house price prediction. In Proceedings of the 2023 IEEE 5th International Conference on Power, Intelligent Computing and Systems (ICPICS), Shenyang, China, 14–16 July 2023; pp. 348–353. [Google Scholar]
- Karamti, H.; Alharthi, R.; Anizi, A.A.; Alhebshi, R.M.; Eshmawi, A.; Alsubai, S.; Umer, M. Improving prediction of cervical cancer using knn imputed smote features and multi-model ensemble learning approach. Cancers 2023, 15, 4412. [Google Scholar] [CrossRef] [PubMed]
Feature | Description |
---|---|
ServiceCharge | Auxiliary costs (electricity, water, etc.) |
NewlyConst | Is the building newly constructed? |
Balcony | Does it have a balcony? |
Picturecount | How many photos are on the listing? |
Pricetrend | The price trend as calculated by Immospot |
TelekomUploadSpeed | How fast is the internet upload speed? |
TotalRent | Total rent (sum of the base rent and other costs) |
NoParkSpaces | Number of parking spaces |
HasKitchen | Has a kitchen |
Cellar | Has a cellar |
YearConstructedRange | Binned construction year, 1 to 9 |
BaseRent | Base rent without electricity and heating |
LivingSpace | Living space in sqm |
Lift | Is an elevator available |
BaseRentRange | Binned base rent, 1 to 9 |
Geo plz | ZIP code |
NoRooms | The number of rooms |
ThermalChar | Energy efficiency class in kWh/(m^2a) |
Floor | Which floor is the flat on |
NumberOfFloors | Number of floors in the building |
NoRoomsRange | Binned number of rooms, 1 to 5 |
Garden | Has a garden |
LivingSpaceRange | Binned living space, 1 to 7 |
Model | Feature Set 1 | Feature Set 2 | ||
---|---|---|---|---|
R² Score | RMSE | R² Score | RMSE | |
RF | 0.9187 | 289.87 | 0.8866 | 342.33 |
GBR | 0.9158 | 294.89 | 0.8946 | 330.00 |
XGB | 0.8903 | 336.68 | 0.8717 | 364.23 |
LR | 0.8928 | 375.57 | 0.9045 | 314.21 |
Model | Feature Set 1 | Feature Set 2 | ||
---|---|---|---|---|
R² Score | RMSE | R² Score | RMSE | |
FNN | 0.9000 | 321.47 | 0.9015 | 319.08 |
PyTorch NN | 0.9041 | 314.83 | 0.9054 | 312.61 |
Run | FNN | PNN |
---|---|---|
1 | 0.905 | 0.914 |
2 | 0.904 | 0.913 |
3 | 0.906 | 0.912 |
4 | 0.907 | 0.916 |
5 | 0.903 | 0.915 |
6 | 0.905 | 0.914 |
Run | FNN | PNN |
---|---|---|
1 | 315.12 | 300.45 |
2 | 314.21 | 299.32 |
3 | 313.45 | 300.67 |
4 | 312.87 | 299.89 |
5 | 314.90 | 301.23 |
6 | 313.67 | 300.12 |
t-Statistic | p-Value | |
---|---|---|
R² Score | −11.61895003862225 | 8.29 × 10⁻⁵ |
RMSE | 39.03519035424145 | 2.08 × 10⁻⁷ |
Model | Feature Set 1 | Feature Set 2 | ||
---|---|---|---|---|
R² Score | RMSE | R² Score | RMSE | |
PyTorch+RF | 0.9181 | 290.87 | 0.9040 | 314.95 |
PyTorch+GBR | 0.9150 | 296.35 | 0.9055 | 312.49 |
PyTorch+XGB | 0.9097 | 305.51 | 0.9022 | 317.86 |
PyTorch+LR | 0.9043 | 314.52 | 0.9071 | 309.88 |
Reference | Dataset | Methods Used | Preferred Model | Result | Limitations | Improvement from Proposed Study |
---|---|---|---|---|---|---|
[10] | Taiwan | CNN, BPNN | CNN | R2 > 0.945 | In existing research, single models are frequently employed to estimate real estate prices; nevertheless, comparative analyses consistently show that neural networks perform better. | Unlike other research, this study divided the dataset into two sets with distinct relevant features, using two feature selection techniques: PSO and CatBoost. Ten different models, incorporating both machine learning (ML) and deep learning (DL) techniques, were applied for comparative analysis. The study also introduces PyTorch, a versatile framework that optimizes the training process to enhance model performance. Additionally, steps were taken to further increase accuracy, particularly for models with lower performance. |
[11] | Taichung, Taiwan | LSSVR, CART, GRNN, BPNN | LSSVR | MAPE = 0.228 | ||
[12] | Chengdu, China | RF, XGBoost, LightGBM | XGBoost | Accuracy = 0.85 MSE = 0.04 | ||
[13] | Shenzhen, China | SVM | SVM | R2 = 1 | ||
[15] | Boston, the US | LR, RF, SVM, ANN, XGBoost, | ANN | R2 = 0.8776 | ||
[14] | Boston, the US | LR, Ridge, Lasso, RF, SVM with RBF, Polynomial Regression | SVM with RBF | Score = 0.799 | ||
[19] | Bangalore, India | LR, Lasso, Ridge, DT, RF, XGBoost, XGRFBoost | XGBoost | Accuracy = 0.6465 | ||
[16] | Ames, the US | Compare models with and without PSO: RF, NB, LR, MLP, KNN, REPTree | RF with PSO | Accuracy = 0.844 RMSE = 0.0699 MAE = 0.019 | The PSO method was used to reduce dimensionality. Various models performed differently, with some demonstrating higher prediction accuracy than others. However, it remains a reliable reference approach for the proposed study. | |
[17] | Ames, the US | Ridge, Lasso, XGBoost, LightGBM, PSO-XGBoost | PSO-XGBoost | R2 = 0.9887 RMSE = 0.1015 | ||
[18] | CS House US House | KNN, SVM, RF, GBDT, XGBoost, BPNet | Stacking Models | RMSE = 0.869 RMSE = 1.029 | One stacking strategy, integrated with multiple models was used to predict the price. It’s challenging to assess the effectiveness of this method without making comparisons with alternative approaches. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, W.; Farag, S.; Butt, U.; Al-Khateeb, H. Leveraging Machine Learning for Sophisticated Rental Value Predictions: A Case Study from Munich, Germany. Appl. Sci. 2024, 14, 9528. https://doi.org/10.3390/app14209528
Chen W, Farag S, Butt U, Al-Khateeb H. Leveraging Machine Learning for Sophisticated Rental Value Predictions: A Case Study from Munich, Germany. Applied Sciences. 2024; 14(20):9528. https://doi.org/10.3390/app14209528
Chicago/Turabian StyleChen, Wenjun, Saber Farag, Usman Butt, and Haider Al-Khateeb. 2024. "Leveraging Machine Learning for Sophisticated Rental Value Predictions: A Case Study from Munich, Germany" Applied Sciences 14, no. 20: 9528. https://doi.org/10.3390/app14209528
APA StyleChen, W., Farag, S., Butt, U., & Al-Khateeb, H. (2024). Leveraging Machine Learning for Sophisticated Rental Value Predictions: A Case Study from Munich, Germany. Applied Sciences, 14(20), 9528. https://doi.org/10.3390/app14209528