Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance

Jesús M. Pérez²⁰,
Javier Muguerza²⁰,
Olatz Arbelaitz²⁰,
Ibai Gurrutxaga²⁰ &
…
José I. Martín²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3686))

Included in the following conference series:

International Conference on Pattern Recognition and Image Analysis

2054 Accesses
14 Citations

Abstract

This paper presents an analysis of the behaviour of Consolidated Trees, CT (classification trees induced from multiple subsamples but without loss of explaining capacity). We analyse how CT trees behave when used to solve a fraud detection problem in a car insurance company. This domain has two important characteristics: the explanation given to the classification made is critical to help investigating the received reports or claims, and besides, this is a typical example of class imbalance problem due to its skewed class distribution. In the results presented in the paper CT and C4.5 trees have been compared, from the accuracy and structural stability (explaining capacity) point of view and, for both algorithms, the best class distribution has been searched.. Due to the different associated costs of different error types (costs of investigating suspicious reports, etc.) a wider analysis of the error has also been done: precision/recall, ROC curve, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Modeling Insurance Fraud Detection Using Imbalanced Data Classification

Data Mining Solutions for Fraud Detection in Credit Card Payments

Tree-Based Cost Sensitive Methods for Fraud Detection in Imbalanced Data

References

Chan, P.K., Stolfo, S.J.: Toward Scalable Learning with Non-uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection. In: Proc. of the 4th Int. Conference on Knowledge Discovery and Data Mining, pp. 164–168 (1998)
Google Scholar
Domingos, P.: Knowledge acquisition from examples via multiple models. In: Proc. 14th International Conference on Machine Learning Nashville, TN, pp. 98–106 (1997)
Google Scholar
Japkowicz, N.: Learning from Imbalanced Data Sets: A Comparison of Various Strategies. In: Proceedings of the AAAI Workshop on Learning from Imbalanced Data Sets, Menlo Park, CA (2000)
Google Scholar
Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I.: A New Algorithm to Build Consolidated Trees: Study of the Error Rate and Steadiness. Advances in Soft Computing. In: Proceedings of the International Intelligent Information Processing and Web Mining Conference (IIS: IIPWM 2004), Zakopane, Poland, pp. 79–88 (2004)
Google Scholar
Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.I.: Analysis of structural convergence of Consolidated Trees when resampling is required. In: Proc. of the 3rd Australasian Data Mining Conf (AusDM 2004), Australia, pp. 9–21 (2004)
Google Scholar
Quinlan, J.R. (ed.): C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Mateo (1993)
Google Scholar
Sebastiani, F.: Machine Learning in Automated Document Categorisation. In: Tutorial of the 18th Int. Conference on Computational Linguistics, Nancy, Francia (2000)
Google Scholar
Weiss, G.M., Provost, F.: Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction. Journal of Artificial Intelligence Research 19, 315–354 (2003)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Architecture and Technology, University of the Basque Country, M. Lardizabal, 1, 20018, Donostia, Spain
Jesús M. Pérez, Javier Muguerza, Olatz Arbelaitz, Ibai Gurrutxaga & José I. Martín

Authors

Jesús M. Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Javier Muguerza
View author publications
You can also search for this author in PubMed Google Scholar
Olatz Arbelaitz
View author publications
You can also search for this author in PubMed Google Scholar
Ibai Gurrutxaga
View author publications
You can also search for this author in PubMed Google Scholar
José I. Martín
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research School of Infomatics, Loughborough, UK
Sameer Singh
ATR Lab, Research School of Informatics, University of Loughborough, Loughborough, UK
Maneesha Singh
IBM Corporation, 1133 Wetchester Avenue, White Plains, 10604, New York, United States
Chid Apte
Institute of Computer Vision and applied Computer Sciences, IBaI, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.I. (2005). Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds) Pattern Recognition and Data Mining. ICAPR 2005. Lecture Notes in Computer Science, vol 3686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551188_41

Download citation

DOI: https://doi.org/10.1007/11551188_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28757-5
Online ISBN: 978-3-540-28758-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Modeling Insurance Fraud Detection Using Imbalanced Data Classification

Data Mining Solutions for Fraud Detection in Credit Card Payments

Tree-Based Cost Sensitive Methods for Fraud Detection in Imbalanced Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Modeling Insurance Fraud Detection Using Imbalanced Data Classification

Data Mining Solutions for Fraud Detection in Credit Card Payments

Tree-Based Cost Sensitive Methods for Fraud Detection in Imbalanced Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation