[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

A novel framework for concept drift detection using autoencoders for classification problems in data streams

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

In streaming data environments, data characteristics and probability distributions are likely to change over time, causing a phenomenon called concept drift, which poses challenges for machine learning models to predict accurately. In such non-stationary environments, there is a need to detect concept drift and update the model to maintain an acceptable predictive performance. Existing approaches to drift detection have inherent problems like requirements of truth labels in supervised detection methods and high false positive rate in case of unsupervised drift detection. In this paper, we propose a semi-supervised Autoencoder based Drift Detection Method (AEDDM) aimed at detecting drift without the need of truth labels, yet with a high confidence that the detected drift is real. In a binary classification setting, AEDDM uses two autoencoders in a layered architecture, trained on labelled data and uses a thresholding mechanism based on reconstruction error to signal the presence of drift. The proposed method has been evaluated on four synthetic and four real world datasets with different drifting scenarios. In case of real-world datasets, the induced and detected drifts have been evaluated from classifier’s performance viewpoint using seven mostly used batch classifiers as well as from adaptation perspective in an online learning environment using Hoeffding Tree classifier. The results show that AEDDM affectively detects the distributional changes in data which are most likely to impact the classifier’s performance (real drift) while ignoring the virtual drift thus considerably reducing the false alarms with an ability to adapt in terms of classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Baena-García M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavaldà R, Morales-Bueno R (2006) Early drift detection method. In: 4th ECML PKDD International Workshop on Knowledge Discovery from Data Streams. Porto Citeseer 6:77–86.

  2. Barros RSM, Cabral DRL, Gonçalves PM, Santos SGTC (2017) RDDM: reactive drift detection method. Expert Syst Appl 90:344–355. https://doi.org/10.1016/j.eswa.2017.08.023

    Article  Google Scholar 

  3. Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 7th SIAM International Conference on data mining, pp 443–448. https://doi.org/10.1137/1.9781611972771.42

  4. Brzeziński D, Stefanowski J (2011) Accuracy updated ensemble for data streams with concept drift. In: International Conference on hybrid intelligent systems, 6679 LNAI(PART 2), pp 155–163. https://doi.org/10.1007/978-3-642-21222-2_19

  5. de Cabral DRL, de Barros RSM (2018) Concept drift detection based on Fisher’s Exact test. Inf Sci 442–443:220–234. https://doi.org/10.1016/j.ins.2018.02.054

    Article  MathSciNet  Google Scholar 

  6. Castellani A, Schmitt S, Hammer B (2021) Task-sensitive concept drift detector with constraint embedding. In: 2021 IEEE Symposium Series on Computational Intelligence, SSCI 2021-Proceedings. https://doi.org/10.1109/SSCI50451.2021.9659969

  7. Costa AFJ, Albuquerque RAS, Dos SEM (2018) A drift detection method based on active learning. In: Proceedings of the International Joint Conference on neural networks, 2018-July. https://doi.org/10.1109/IJCNN.2018.8489364

  8. Ditzler G, Polikar R (2013) Incremental learning of concept drift from streaming imbalanced data. IEEE Trans Knowl Data Eng 25(10):2283–2301. https://doi.org/10.1109/TKDE.2012.136

    Article  Google Scholar 

  9. Ditzler G, Polikar R (2011) Hellinger distance based drift detection for nonstationary environments. In: IEEE SSCI 2011: Symposium Series on Computational Intelligence-CIDUE 2011: 2011 IEEE Symposium on computational intelligence in dynamic and uncertain environments, pp 41–48. https://doi.org/10.1109/CIDUE.2011.5948491

  10. Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25. https://doi.org/10.1109/MCI.2015.2471196

    Article  Google Scholar 

  11. Dos Reis D, Flach P, Matwin S, Batista G (2016) Fast unsupervised online drift detection using incremental kolmogorov-smirnov test. In: Proceedings of the ACM SIGKDD International Conference on knowledge discovery and data mining, 13–17-August, pp 1545–1554. https://doi.org/10.1145/2939672.2939836

  12. Fan W (2004). Systematic data selection to mine concept-drifting data streams. In: KDD-2004—Proceedings of the Tenth ACM SIGKDD International Conference on knowledge discovery and data mining, pp 128–137. https://doi.org/10.1145/1014052.1014069

  13. Flórez A, Rodríguez-Moreno I, Artetxe A, Olaizola IG, Sierra B (2023) CatSight, a direct path to proper multi-variate time series change detection: perceiving a concept drift through common spatial pattern. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-023-01810-z

    Article  Google Scholar 

  14. Frías-Blanco I, Del Campo-Ávila J, Ramos-Jiménez G, Morales-Bueno R, Ortiz-Díaz A, Caballero-Mota Y (2015) Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans Knowl Data Eng 27(3):810–823. https://doi.org/10.1109/TKDE.2014.2345382

    Article  Google Scholar 

  15. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3171, 286–295, https://doi.org/10.1007/978-3-540-28645-5_29

  16. Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv. https://doi.org/10.1145/2523813

    Article  Google Scholar 

  17. Gemaque RN, Costa AFJ, Giusti R, dos Santos EM (2020) An overview of unsupervised drift detection methods. Wiley Interdiscipl Rev Data Min Knowl Discov 10(6):10. https://doi.org/10.1002/widm.1381

    Article  Google Scholar 

  18. Goodfellow YBA (2016) Deep learning. MIT Press

    Google Scholar 

  19. Gözüaçık Ö, Bonab H, Büyükçakır A, Can F (2019) Unsupervised concept drift detection with a discriminative classifier. In: International Conference on information and knowledge management, proceedings, pp 2365–2368. https://doi.org/10.1145/3357384.3358144

  20. Gu F, Zhang G, Lu J, Lin CT (2016) Concept drift detection based on equal density estimation. In: Proceedings of the International Joint Conference on neural networks, 2016-October, pp 24–30. https://doi.org/10.1109/IJCNN.2016.7727176

  21. Haqu, A, Khan L, Baron M (2016) SAND: Semi-supervised adaptive novel class detection and classification over data stream. In: 30th AAAI Conference on artificial intelligence, AAAI 2016, 1652–1658.

  22. Haque A, Khan L, Baron M, Thuraisingham B, Aggarwal C (2016) Efficient handling of concept drift and concept evolution over Stream Data. In: 2016 IEEE 32nd International Conference on data engineering, ICDE 2016, 481–492. https://doi.org/10.1109/ICDE.2016.7498264

  23. Harries M, Wales NS (1999) Splice-2 comparative evaluation: Electricity pricing,” Artificial Intelligence Group, School of Computer Science and Engineering, University of New South Wales

  24. Hinton GE, Zemel RS (1994) Autoencoders, minimum description length and Helmholtz free energy. Adv Neural Inf Process Syst 6:3–10

    Google Scholar 

  25. Hu H, Kantardzic M, Sethi TS (2020) No Free Lunch Theorem for concept drift detection in streaming data classification: A review. Wiley Interdiscipl Rev Data Min Knowl Discov. https://doi.org/10.1002/widm.1327

    Article  Google Scholar 

  26. Hulten G, Spencer L, Domingos P (2001a) Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on knowledge discovery and data mining, pp 97–106. https://doi.org/10.1145/502512.502529

  27. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on knowledge discovery and data mining, pp 97–106. https://doi.org/10.1145/502512.502529

  28. Iwashita AS, Papa JP (2019) An overview on concept drift learning. IEEE Access 7:1532–1547. https://doi.org/10.1109/ACCESS.2018.2886026

    Article  Google Scholar 

  29. Jaworski M, Duda P, Rutkowski L (2018) On applying the restricted Boltzmann machine to active concept drift detection. In: 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017-Proceedings, 2018-January, pp1–8. https://doi.org/10.1109/SSCI.2017.8285409

  30. Jaworski M, Rutkowski L, Angelov P (2020). Concept drift detection using autoencoders in data streams processing. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12415 LNAI, pp 124–133. https://doi.org/10.1007/978-3-030-61401-0_12

  31. Jaworski M, Rutkowski L, Angelov P, Artificial, P. A.-I. C. on, & 2020, undefined (2020) Concept drift detection using autoencoders in data streams processing. Springer, Berlin. https://doi.org/10.1007/978-3-030-61401-0_12

    Book  Google Scholar 

  32. Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790

    Google Scholar 

  33. Liao J, Zhang J, Ng WW Y (2016) Effects of different base classifiers to Learn++ family algorithms for concept drifting and imbalanced pattern classification problems. In: Proceedings—International Conference on machine learning and cybernetics, 1, pp 99–104. https://doi.org/10.1109/ICMLC.2016.7860884

  34. Liu A, Lu J, Liu F, Zhang G (2018) Accumulating regional density dissimilarity for concept drift detection in data streams. Pattern Recogn 76:256–272. https://doi.org/10.1016/j.patcog.2017.11.009

    Article  Google Scholar 

  35. Liu G, Cheng HR, Qin ZG, Liu Q, Liu CX (2013) E-CVFDT: an improving CVFDT method for concept drift data stream. In: 2013 International Conference on communications, circuits and systems, ICCCAS 2013, 1, pp 315–318. https://doi.org/10.1109/ICCCAS.2013.6765241

  36. Losing V, Hammer B, Wersing H (2017) KNN classifier with self adjusting memory for heterogeneous concept drift. In: Proceedings-IEEE International Conference on data mining, ICDM, pp 291–300. https://doi.org/10.1109/ICDM.2016.141

  37. Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G (2019) Learning under concept drift: a review. IEEE Trans Knowl Data Eng 31(12):2346–2363. https://doi.org/10.1109/TKDE.2018.2876857

    Article  Google Scholar 

  38. Masud M, Gao J, Khan L, Han J, Thuraisingham BM (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874. https://doi.org/10.1109/TKDE.2010.61

    Article  Google Scholar 

  39. Menon AG, Gressel G (2021) Concept drift detection in phishing using autoencoders. Commun Comput Inform Sci 1366:208–220. https://doi.org/10.1007/978-981-16-0419-5_17

    Article  Google Scholar 

  40. Montiel J, Read J, Bifet A, Abdessalem T (2018) Scikit-multiflow: a multi-output streaming framework. J Mach Learn Res. https://doi.org/10.5555/3291125.3309634

    Article  Google Scholar 

  41. Murao J, Yonekawa K, Kurokawa M, Amagata D, Maekawa T, Hara T (2022) Concept drift detection with denoising autoencoder in incomplete data. Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, 419 LNICST, pp 541–552. https://doi.org/10.1007/978-3-030-94822-1_35

  42. Nick Street W, Kim YS (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on knowledge discovery and data mining, pp 377–382. https://doi.org/10.1145/502512.502568

  43. Nishida K, Yamauchi K (2007) Detecting concept drift using statistical testing. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4755 LNAI, pp 264–269. https://doi.org/10.1007/978-3-540-75488-6_27

  44. Oladele S (2021) A comprehensive guide on how to monitor your models in production-neptune.ai. Página Oficial Neptune AI. https://neptune.ai/blog/how-to-monitor-your-models-in-production-guide Accessed 20 June 2023

  45. Page ES (1954) Continuous inspection schemes. Biometrika 41(1/2):100. https://doi.org/10.2307/2333009

    Article  MathSciNet  Google Scholar 

  46. Pesaranghader A, Viktor HL, Paquet E (2018) McDiarmid drift detection methods for evolving data streams. In: Proceedings of the International Joint Conference on neural networks, 2018-July. https://doi.org/10.1109/IJCNN.2018.8489260

  47. Pinagé F, dos Santos EM, Gama J (2020) A drift detection method based on dynamic classifier selection. Data Min Knowl Disc 34(1):50–74. https://doi.org/10.1007/s10618-019-00656-w

    Article  MathSciNet  Google Scholar 

  48. Qahtan A, Alharbi B, Wang S, Zhang X (2015) A PCA-based change detection framework for multidimensional data streams. In: Proceedings of the ACM SIGKDD International Conference on knowledge discovery and data mining, 2015-August, pp 935–944. https://doi.org/10.1145/2783258.2783359

  49. Raab C, Heusinger M, Schleif FM (2020) Reactive soft prototype computing for concept drift streams. Neurocomputing 416:340–351. https://doi.org/10.1016/j.neucom.2019.11.111

    Article  Google Scholar 

  50. Schelter S, Biessmann F, Januschowski T, Salinas D, Seufert S, Szarvas G (2018) On challenges in machine learning model management. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pp 5–13. http://sites.computer.org/debull/A18dec/p5.pdf Accessed 20 June 2023

  51. Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354. https://doi.org/10.1023/A:1022810614389

    Article  Google Scholar 

  52. Schröder T, Schulz M (2022) Monitoring machine learning models: a categorization of challenges and methods. Data Sci Manag. https://doi.org/10.1016/j.dsm.2022.07.004

    Article  Google Scholar 

  53. Sethi TS, Kantardzic M (2015) Don’t pay for validation: detecting drifts from unlabeled data using Margin Density. Proc Comput Sci 53(1):103–112. https://doi.org/10.1016/j.procs.2015.07.284

    Article  Google Scholar 

  54. Sethi TS, Kantardzic M (2017) On the reliable detection of concept drift from streaming unlabeled data. Expert Syst Appl 82:77–99. https://doi.org/10.1016/j.eswa.2017.04.008

    Article  Google Scholar 

  55. Sidhu P, Bhatia MPS (2015) An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection. Int J Mach Learn Cybern 6(6):883–909. https://doi.org/10.1007/s13042-015-0366-1

    Article  Google Scholar 

  56. Sidhu P, Bhatia MPS (2019) A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority. Int J Mach Learn Cybern 10(3):563–578. https://doi.org/10.1007/s13042-017-0738-9

    Article  Google Scholar 

  57. Soppin S, Ramachandra M, Chandrashekar BN (2021) Essentials of deep learning and ai: experience unsupervised learning, autoencoders, feature engineering, and time series analysis with tensorflow, keras, and scikit-learn (English Edition)

  58. Spinosa EJ, De Carvalho APDLF, Gama J (2007) OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams. In: Proceedings of the ACM Symposium on applied computing, pp 448–452. https://doi.org/10.1145/1244002.1244107

  59. Wald A (1973) Sequential analysis. DOVER PUBLICATIONS, INC.

    Google Scholar 

  60. Wang Haixun, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. Proceedings of the ACM SIGKDD International Conference on knowledge discovery and data mining, 226. https://doi.org/10.1145/956755.956778

  61. Wang, Heng, Abraham Z (2015) Concept drift detection for streaming data. In: Proceedings of the International Joint Conference on neural networks, 2015-Septe. https://doi.org/10.1109/IJCNN.2015.7280398

  62. Wang S, Minku LL, Ghezzi D, Caltabiano D, Tino P, Yao X (2013) Concept drift detection for online class imbalance learning. In: Proceedings of the International Joint Conference on neural networks, pp https://doi.org/10.1109/IJCNN.2013.6706768

  63. Wang Z, Wang W (2020) Concept drift detection based on Kolmogorov–Smirnov Test. In: Lecture Notes in Electrical Engineering, 572 LNEE, pp 273–280. https://doi.org/10.1007/978-981-15-0187-6_31

  64. Wang Z, Wang W (2020) Concept drift detection based on Kolmogorov–Smirnov Test. Lecture Notes in Electrical Engineering, 572 LNEE, pp 273–280. https://doi.org/10.1007/978-981-15-0187-6_31

  65. Wares S, Isaacs J, Elyan E (2019) Data stream mining: methods and challenges for handling concept drift. SN Appl Scie. https://doi.org/10.1007/s42452-019-1433-0

    Article  Google Scholar 

  66. Yong, B. X., Fathy, Y., & Brintrup, A. (2020a). Bayesian autoencoders for drift detection in industrial environments. In: 2020 IEEE International Workshop on Metrology for Industry 4.0 and IoT, MetroInd 4.0 and IoT 2020-Proceedings, pp 627–631. https://doi.org/10.1109/MetroInd4.0IoT48571.2020.9138306

  67. Yong BX, Fathy Y, Brintrup A (2020) Bayesian autoencoders for drift detection in industrial environments. In: 2020 IEEE International Workshop on Metrology for Industry 4.0 and IoT, MetroInd 4.0 and IoT 2020-Proceedings, pp 627–631. https://doi.org/10.1109/MetroInd4.0IoT48571.2020.9138306

  68. Yu S, & Abraham Z (2017) Concept drift detection with hierarchical hypothesis testing. In: Proceedings of the 17th SIAM International Conference on data mining, SDM 2017, pp 768–776. https://doi.org/10.1137/1.9781611974973.86

  69. Zhan S, Li Y, Liu C, Zhao Y (2024) Unsupervised concept drift detection based on stacked autoencoder and Page-Hinckley Test. Green Pervasive Cloud Comput. https://doi.org/10.1007/978-981-99-9893-7_15

    Article  Google Scholar 

Download references

Funding

The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Usman Ali.

Ethics declarations

Conflict of interest

The authors have no competing interests relevant to this article's content. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ali, U., Mahmood, T. A novel framework for concept drift detection using autoencoders for classification problems in data streams. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02223-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13042-024-02223-2

Keywords

Navigation