Abstract
This paper presents an analysis of the behaviour of Consolidated Trees, CT (classification trees induced from multiple subsamples but without loss of explaining capacity). We analyse how CT trees behave when used to solve a fraud detection problem in a car insurance company. This domain has two important characteristics: the explanation given to the classification made is critical to help investigating the received reports or claims, and besides, this is a typical example of class imbalance problem due to its skewed class distribution. In the results presented in the paper CT and C4.5 trees have been compared, from the accuracy and structural stability (explaining capacity) point of view and, for both algorithms, the best class distribution has been searched.. Due to the different associated costs of different error types (costs of investigating suspicious reports, etc.) a wider analysis of the error has also been done: precision/recall, ROC curve, etc.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chan, P.K., Stolfo, S.J.: Toward Scalable Learning with Non-uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection. In: Proc. of the 4th Int. Conference on Knowledge Discovery and Data Mining, pp. 164–168 (1998)
Domingos, P.: Knowledge acquisition from examples via multiple models. In: Proc. 14th International Conference on Machine Learning Nashville, TN, pp. 98–106 (1997)
Japkowicz, N.: Learning from Imbalanced Data Sets: A Comparison of Various Strategies. In: Proceedings of the AAAI Workshop on Learning from Imbalanced Data Sets, Menlo Park, CA (2000)
Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I.: A New Algorithm to Build Consolidated Trees: Study of the Error Rate and Steadiness. Advances in Soft Computing. In: Proceedings of the International Intelligent Information Processing and Web Mining Conference (IIS: IIPWM 2004), Zakopane, Poland, pp. 79–88 (2004)
Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.I.: Analysis of structural convergence of Consolidated Trees when resampling is required. In: Proc. of the 3rd Australasian Data Mining Conf (AusDM 2004), Australia, pp. 9–21 (2004)
Quinlan, J.R. (ed.): C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Mateo (1993)
Sebastiani, F.: Machine Learning in Automated Document Categorisation. In: Tutorial of the 18th Int. Conference on Computational Linguistics, Nancy, Francia (2000)
Weiss, G.M., Provost, F.: Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction. Journal of Artificial Intelligence Research 19, 315–354 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.I. (2005). Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds) Pattern Recognition and Data Mining. ICAPR 2005. Lecture Notes in Computer Science, vol 3686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551188_41
Download citation
DOI: https://doi.org/10.1007/11551188_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28757-5
Online ISBN: 978-3-540-28758-2
eBook Packages: Computer ScienceComputer Science (R0)