Abstract
As an iterative algorithm consisting of multiple decision trees, gradient boosting decision tree (GBDT) is widely used in problems such as classification and regression prediction. The ensemble decision trees of the algorithm obtain predictive effect by automatically filtering and combining new feature vectors, which contributes to discovering effective feature combinations. However, gradient boosting tree (GBT) is a tedious model, especially the boosting tree approach. It is difficult to interpret the principle of the model due to the characteristic of each tree of the model with weights and the unique structural properties of each decision tree, which is a challenge in many fields that require high interpretation such as financial risk control. In this paper, we design an interactive visual analytic system to solve this problem, to explain the structure and prediction process of the gradient boosting tree model, and to help experts in related fields to perform efficient analysis. We have designed a graphical representation of the feature information and a visual model of the boosting tree to show the basic mechanism of the GBT algorithm in a comprehensive way. The case study is conducted on the dataset of Kaggle competition to prove the effectiveness of the system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Jerome, H.F.: Greedy function approximation: a gradient boosting machine. Ann. Statist. 25, 1189–1232 (2001)
Microsoft LightGBM. https://github.com/Microsoft/LightGBM. Accessed 21 Aug 2021
He, X., et al.: Practical lessons from predicting clicks on ads at Facebook. In: Proceedings of the Eighth International Workshop on Data Mining for Online Advertising (2014)
Liu, S., Xiao, J., Liu, J., Wang, X., Wu, J., Zhu, J.: Visual diagnosis of tree boosting methods. IEEE Trans. Visual. Comput. Graph. 24(1), 163–173 (2017)
Zhao, X., Wu, Y., Lee, D., Cui, W.: iForest: interpreting random forests via visual analytics. IEEE Trans. Visual. Comput. Graph. 25(1), 407–416 (2018)
Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms. CRC Press (2012)
Sandulescu, V., Chiru, M.: Predicting the future relevance of research institutions - the winning solution of the KDD Cup 2016. arXiv eprints:1609.02728 (2016)
Cossok, D., Zhang, T.: Statistical analysis of Bayes optimal subset ranking. IEEE Trans. Inform. Theory 54(11), 5140–5154 (2008)
Palczewsk, A., Palczewski, J., Robinson, R.M., Neagu, D.: Interpreting random forest classification models using a feature contribution method. In: Integration of Reusable Systems, pp. 193–218 (2014)
Lipton, Z.C.: The mythos of model interpretability. arXiv preprint arXiv:1606.03490 (2016)
Guidotti, R., Monreale, A., Turini, F., Pedreschi, D., Giannotti, F.: A survey of methods for explaining black box models. arXiv preprint arXiv:1802.01933 (2018)
Stiglic, G., Mertik, M., Podgorelec, V., Kokol, P.: Using visual interpretation of small ensembles in microarray analysis. In: IEEE International Symposium on Computer-Based Medical Systems, pp. 691–695 (2006)
Furcy, D., Koenig, S.: Limited discrepancy beam search. In: IJCAI, pp. 125–131 (2005)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Paiva, J.G.S., Schwartz, W.R., Pedrini, H., Minghim, R.: An approach to supporting incremental visual data classification. IEEE Trans. Visual. Comput. Graph. 21(1), 4–17 (2015)
Jakulin, A., Mozˇina, M., Demsˇar, J., Bratko, I., Zupan, B.: Nomograms for visualizing support vector machines. In: KDD, pp. 108–117 (2005)
Ren, D., Amershi, S., Lee, B., Suh, J., Williams, J.D.: Squares: supporting interactive performance analysis for multiclass classifiers. IEEE Trans. Visual. Comput. Graph. 23(1), 61–70 (2017)
van den Elzen, S., van Wijk, J.J.: BaobabView: Interactive construction and analysis of decision trees. In: VAST, pp. 151–160 (2011)
Urbanek, S.: Exploring statistical forests. In: Proceedings of the 2002 Joint Statistical Meeting, Springer (2002)
Stiglic, G., Mertik, M., Podgorelec, V., Kokol, P.: Using visual interpretation of small ensembles in microarray analysis. In: Proceedings of the CMBS 2006, pp. 691–695 (2006)
Krause, J., Perer, A., Ng, K.: Interacting with predictions: Visual inspection of black-box machine learning models. In: CHI, pp. 5686– 5697 (2016)
Talbot, J., Lee, B., Kapoor, A., Tan, D.S.: Ensemblematrix: Interactive visualization to support machine learning with multiple classifiers. In: CHI, pp. 1283–1292 (2009)
Kim, B., Rudin, C., Shah, J.A.: The Bayesian case model: a generative approach for case-based reasoning and prototype classification. In: Advances in Neural Information Processing Systems, pp. 1952–1960 (2014)
Click-through rate (CTR). https://www.kaggle.com/c/avazu-ctr-prediction/data. Accessed 24 Aug 2021
Wang, J., Gou, L., Shen, H., Yang, H.: DQNViz: a visual analytics approach to understand deep q-networks. IEEE Trans. Visual. Comput. Graph. 25(1), 288–298 (2019)
Streeb, D., et al.: Task-based visual interactive modeling: decision trees and rule-based classifiers. IEEE Trans. Visual. Comput. Graph. 28, 2207–3323 (2021)
Wang, J., Zhang, W., Wang, L., Yang, H.: Investigating the evolution of tree boosting models with visual analytics. In: 2021 IEEE 14th Pacific Visualization Symposium, pp. 186–195 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cheng, Z., Cheng, K., Xia, Y., Pu, J., Rao, Y. (2022). A Visual Analytics Approach to Understanding Gradient Boosting Tree via Click Prediction on Ads. In: Luo, Y. (eds) Cooperative Design, Visualization, and Engineering. CDVE 2022. Lecture Notes in Computer Science, vol 13492. Springer, Cham. https://doi.org/10.1007/978-3-031-16538-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-16538-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16537-5
Online ISBN: 978-3-031-16538-2
eBook Packages: Computer ScienceComputer Science (R0)