Abstract
Prediction models are being used more and more widely in many sectors. FinTech (Financial Technology) is not an exception. Many problems in FinTech can be considered prediction problems. Some notable examples are predicting the probability that a transaction is fraudulent or predicting the most suitable company to invest in, given some constraints. In this research, the focus is on customer spending prediction. More specifically, we are interested in knowing how much a customer may spend in a period given her past purchases. Such information is crucial for the optimal planning and budgeting of businesses. As a first step in tackling this prediction problem, this research explores the feasibility of different statistical methods and machine learning algorithms in accurately predicting customer spending. The subjects we investigate in this research include Beta Geometric/Negative Binomial Distribution (BG/NBD), Gamma–Gamma, Linear Regression, Random Forest, and Light Gradient Boosting Machine (LightGBM). To make the prediction models and their results more accessible to the average users, we utilize information visualization as the primary communication with human users. We hope doing so can bridge the gap between prediction performance and users’ insight into the reasons behind the performance. With better insight, users can make more appropriate decisions in selecting a method/algorithm to build a prediction model under a specific circumstance. The result of this research can also serve as a foundation for more in-depth work on the same problem in the future.
Similar content being viewed by others
References
Stringfellow A, Nie W, Bowen DE. CRM: Profiting from understanding customer needs. Bus Horiz. 2004;47(5):45–52.
Otto PE, et al. From spending to understanding: analyzing customers by their spending behavior. J Retail Consum Serv. 2009;16(1):10–8.
Hall RE. Stochastic implications of the life cycle-permanent income hypothesis: theory and evidence. J Polit Econ. 1978;86(6):971–87.
Campbell JY, Mankiw NG. Permanent income, current income, and consumption. J Business Econ Stat. 1990;8(3):265–79.
Shea J. Myopia, liquidity constraints, and aggregate consumption: a simple test. J Money, Credit, Bank. 1995;27(3):798–805.
Mehra YP, Martin E. Why does consumer sentiment predict household spending? FRB Richmond Economic Quarterly. 2003;89(4):51–67.
Fornell C, Rust RT, Dekimpe MG. The effect of customer satisfaction on consumer spending growth. J Mark Res. 2010;47(1):28–35.
Castéran H, Meyer-Waarden L, Reinartz W. Modeling customer lifetime value, retention, and churn. In: Castéran H, Meyer-Waarden L, Reinartz W, editors. Handbook of market research. Cham: Springer International Publishing; 2021. p. 1001–33.
Gupta S, Lehmann DR, Stuart JA. Valuing customers. J Market Res. 2004;41(1):7–18.
Cui D, Curry D. Prediction in marketing using the support vector machine. Mark Sci. 2005;24(4):595–615.
Chen PP et al. Customer lifetime value in video games using deep learning and parametric models. In: 2018 IEEE international conference on big data (big data). IEEE, (2018).
Xie Y, et al. Customer churn prediction using improved balanced random forests. Expert Syst Appl. 2009;36(3):5445–9.
Tsai C-F, Yu-Hsin Lu. Customer churn prediction by hybrid neural networks. Expert Syst Appl. 2009;36(10):12547–53.
Huang B, Kechadi MT, Buckley B. Customer churn prediction in telecommunications. Expert Syst Appl. 2012;39(1):1414–25.
Qiu J, Lin Z, Li Y. Predicting customer purchase behavior in the e-commerce context. Electron Commer Res. 2015;15:427–52.
Martínez A, et al. A machine learning framework for customer purchase prediction in the non-contractual setting. Eur J Operational Res. 2020;281(3):588–96.
Preece A. Asking ‘Why’in AI: Explainability of intelligent systems–perspectives and challenges. Intell Syst Account Finance Manag. 2018;25(2):63–72.
Páez A. The pragmatic turn in explainable artificial intelligence (XAI). Mind Mach. 2019;29(3):441–59.
Vilone G, Longo L. Explainable artificial intelligence: a systematic review. arXiv preprint arXiv:2006.00093 (2020).
Amershi, S, et al. Guidelines for human-AI interaction. Proceedings of the 2019 chi conference on human factors in computing systems. 2019.
Mengchen L, et al. Towards better analysis of deep convolutional neural networks. IEEE Trans Visual Comput Graphics. 2016;23(1):91–100.
Kahng M, et al. A cti v is: Visual exploration of industry-scale deep neural network models. IEEE Trans Visual Comput Graphics. 2017;24(1):88–97.
Spitzer M, et al. BoxPlotR: a web tool for generation of box plots. Nat Methods. 2014;11(2):121–2.
Keim DA, et al. Generalized scatter plots. Inf Visual. 2010;9(4):301–11.
Li Y, et al. Drawing and studying on histogram. Cluster Comput. 2019;22(Suppl 2):3999–4006.
Shneiderman B. Tree visualization with tree-maps: 2-d space-filling approach. ACM Trans Graph. 1992;11(1):92–9.
Cockburn A, McKenzie B. An evaluation of cone trees. People and Computers XIV—Usability or Else! Proceedings of HCI 2000. Springer London (2000).
Inselberg, A, Dimsdale B. Parallel coordinates: a tool for visualizing multi-dimensional geometry. In: Proceedings of the first IEEE conference on visualization: visualization 90. IEEE, (1990).
Tran TD, Dang TK. Visualization of web form submissions for security analysis. Int J Web Inf Syst. 2013;9(2):165–80.
Tran TD, TK Dang, Nguyen Le T-G. Interactive Visual Decision tree for developing detection rules of attacks on web applications. Int J Adv Comput Sci Appl 2018;9(7).
Marill KA. Advanced statistics: linear regression, part I: simple linear regression. Acad Emerg Med. 2004;11(1):87–93.
Lu Y, et al. The state-of-the-art in predictive visual analytics. Comput Graph Forum. 2017;36(3):539–62.
Ren D, et al. Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE Trans Visual Comput Graphics. 2016;23(1):61–70.
Steed CA, et al. CrossVis: A visual analytics system for exploring heterogeneous multivariate data with applications to materials and climate sciences. Graph Visual Comput. 2020;3:200013.
Fader PS, Hardie BGS, Lee KL. Counting your customers the easy way: An alternative to the Pareto/NBD model. Mark Sci. 2005;24(2):275–84.
Schmittlein DC, Morrison DG, Colombo R. Counting your customers: Who-are they and what will they do next? Manage Sci. 1987;33(1):1–24.
Fader PS, Hardie BGS. The Gamma-Gamma model of monetary value. February. 2013;2:1–9.
Yuan M, et al. Dimension reduction and coefficient estimation in multivariate linear regression. J R Stat Soc : Series B Stat Methodol. 2007;69(3):329–46.
Aiken LS, West SG, Pitts SC. Multiple linear regression. In: Weiner IB, editor. Handbook of psychology. US p: Wiley; 2003. p. 481–507.
Brownlee J. Bagging and random forest ensemble algorithms for machine learning. Mach Learn Alg 2016;4–22.
Quinlan JR. Learning decision tree classifiers. ACM Comput Surv (CSUR). 1996;28(1):71–2.
Dietterich TG. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach Lear. 2000;40:139–57.
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Friedman JH. Greedy function approximation: a gradient boosting machine. Annal Stat. 2001;29:1189–232.
Ke G et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 2017;30.
Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geosci Model Dev. 2014;7(3):1247–50.
De Myttenaere A, et al. Mean absolute percentage error for regression models. NeuroComput. 2016;192:38–48.
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Future Data and Security Engineering 2022” guest edited by Tran Khanh Dang.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dang, T.T., Hoang, K.N., Thanh, L.B. et al. Constructing and Understanding Customer Spending Prediction Models. SN COMPUT. SCI. 4, 852 (2023). https://doi.org/10.1007/s42979-023-02284-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-02284-0