[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-030-04491-6_21guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

An Approach Based on Contrast Patterns for Bot Detection on Web Log Files

Published: 03 January 2019 Publication History

Abstract

Nowadays, companies invest resources in detecting non-human accesses on their web traffics. Usually, non-human accesses are a few compared with the human accesses, which is considered as a class imbalance problem, and as a consequence, classifiers bias their classification results toward the human accesses obviating, in this way, the non-human accesses. In some classification problems, such as the non-human traffic detection, high accuracy is not only the desired quality, the model provided by the classifier should be understood by experts. For that, in this paper, we study the use of contrast pattern-based classifiers for building an understandable and accurate model for detecting non-human traffic on web log files. Our experiments over five databases show that the contrast pattern-based approach obtains significantly better AUC results than other state-of-the-art classifiers.

References

[1]
Dong, G.: Preliminaries. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications. Data Mining and Knowledge Discovery Series, chap. 1, pp. 3–12. Chapman & Hall/CRC (2012)
[2]
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1999, pp. 43–52. ACM, New York (1999)
[3]
Dong G, Zhang X, Wong L, and Li J Arikawa S and Furukawa K CAEP: classification by aggregating emerging patterns Discovery Science 1999 Heidelberg Springer 30-42
[4]
García S, Fernández A, Luengo J, and Herrera F Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power Inf. Sci. 2010 180 10 2044-2064
[5]
García-Borroto M, Martínez-Trinidad JF, and Carrasco-Ochoa JA Finding the best diversity generation procedures for mining contrast patterns Expert Syst. Appl. 2015 42 11 4859-4866
[6]
García-Borroto M, Martínez-Trinidad JF, Carrasco-Ochoa JA, Medina-Pérez MA, and Ruiz-Shulcloper J LCMine: an efficient algorithm for mining discriminative regularities and its application in supervised classification Pattern Recogn. 2010 43 9 3025-3034
[7]
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, and Witten IH The WEKA data mining software: an update SIGKDD Explor. 2009 11 1 10-18
[8]
Hallam-Baker, P.M., Behlendorf, B.: W3C - Extended Log File Format. www.w3.org, https://www.w3.org/TR/WD-logfile.html
[9]
Huang J and Ling CX Using AUC and accuracy in evaluating learning algorithms IEEE Trans. Knowl. Data Eng. 2005 17 3 299-310
[10]
Iqbal, M.S., Zulkernine, M., Jaafar, F., Gu, Y.: FCFraud: fighting click-fraud from the user side. In: 17th International Symposium on High Assurance Systems Engineering (HASE), pp. 157–164, January 2016
[11]
Knobbe, A., Crémilleux, B., Fürnkranz, J., Scholz, M.: From local patterns to global models: the LeGo approach to data mining. In: International Workshop from Local Patterns to Global Models (ECML 2008), pp. 1–16. LeGo (2008)
[12]
Loyola-González O, Martínez-Trinidad JF, Carrasco-Ochoa JA, and García-Borroto M Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases Neurocomputing 2016 175 Part B 935-947
[13]
Loyola-González O, Medina-Pérez MA, Martínez-Trinidad JF, Carrasco-Ochoa JA, Monroy R, and García-Borroto M PBC4cip: a new contrast pattern-based classifier for class imbalance problems Knowl.-Based Syst. 2017 115 100-109
[14]
Martens D, Baesens B, Gestel TV, and Vanthienen J Comprehensible credit scoring models using rule extraction from support vector machines Eur. J. Oper. Res. 2007 183 3 1466-1476
[15]
Perera KS, Neupane B, Faisal MA, Aung Z, and Woon WL Prasath R and Kathirvalavakumar T A novel ensemble learning-based approach for click fraud detection in mobile advertising Mining Intelligence and Knowledge Exploration 2013 Cham Springer 370-382
[16]
Soldo, F., Metwally, A.: Traffic anomaly detection based on the IP size distribution. In: International Conference on Computer Communications, pp. 2005–2013 (2012)
[17]
Taneja, M., Garg, K., Purwar, A., Sharma, S.: Prediction of click frauds in mobile advertising. In: Eighth International Conference on Contemporary Computing (IC3), pp. 162–166 (2015).
[18]
Zhang, X., Dong, G.: Overview and analysis of contrast pattern based classification. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications. Data Mining and Knowledge Discovery Series, chap. 11, pp. 151–170. Chapman & Hall/CRC (2012)
[19]
Zhang X, Dong G, and Ramamohanarao K Leung KS, Chan L-W, and Meng H Information-based classification by aggregating emerging patterns Intelligent Data Engineering and Automated Learning — IDEAL 2000. Data Mining, Financial Engineering, and Intelligent Agents 2000 Heidelberg Springer 48-53

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Advances in Soft Computing: 17th Mexican International Conference on Artificial Intelligence, MICAI 2018, Guadalajara, Mexico, October 22–27, 2018, Proceedings, Part I
Oct 2018
453 pages
ISBN:978-3-030-04490-9
DOI:10.1007/978-3-030-04491-6

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 03 January 2019

Author Tags

  1. Bot detection
  2. Contrast pattern
  3. Supervised classification

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media