More Web Proxy on the site http://driver.im/

research-article

Handling concept drift via model reuse

Authors:

Zhi-Hua ZhouAuthors Info & Claims

Machine Learning, Volume 109, Issue 3

Pages 533 - 568

https://doi.org/10.1007/s10994-019-05835-w

Published: 01 March 2020 Publication History

Abstract

In many real-world applications, data are often collected in the form of a stream, and thus the distribution usually changes in nature, which is referred to as concept drift in the literature. We propose a novel and effective approach to handle concept drift via model reuse, that is, reusing models trained on previous data to tackle the changes. Each model is associated with a weight representing its reusability towards current data, and the weight is adaptively adjusted according to the performance of the model. We provide both generalization and regret analysis to justify the superiority of our approach. Experimental results also validate its efficacy on both synthetic and real-world datasets.

References

[1]

Bartlett PL, Bousquet O, Mendelson S, et al.Local rademacher complexitiesThe Annals of Statistics20053341497-153721665541083.62034

[2]

Bartlett PL and Mendelson SRademacher and gaussian complexities: Risk bounds and structural resultsJournal of Machine Learning Research20023463-48219840261084.68549

[3]

Besbes O, Gur Y, and Zeevi AJNon-stationary stochastic optimizationOperations Research20156351227-124434225451338.90280

[4]

Beygelzimer, A., Kale, S., & Luo, H. (2015). Optimal and adaptive algorithms for online boosting. In Proceedings of the 32nd international conference on machine learning (ICML), pp. 2323–2331.

[5]

Bifet, A., & Gavaldà, R. (2007). Learning from time-changing data with adaptive windowing. In Proceedings of the 7th SIAM international conference on data mining (SDM), pp. 443–448 .

[6]

Bousquet, O. (2002). Concentration inequalities and empirical processes theory applied to the analysis of learning algorithms. PhD thesis, Ecole Polytechnique.

[7]

Cesa-Bianchi N, Freund Y, Haussler D, Helmbold DP, Schapire RE, and Warmuth MKHow to use expert adviceJournal of the ACM1997443427-48514701520890.68066

[8]

Cesa-Bianchi N and Lugosi G Prediction, learning, and games 2006 Cambridge Cambridge University Press

[9]

Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., & Batista, G. (2015). The ucr time series classification archive. Retrieved September 8, 2018, from http://www.cs.ucr.edu/~eamonn/time_series_data.

[10]

Crammer, K., Mansour, Y., Even-Dar, E., & Vaughan, J. W. (2010). Regret minimization with concept drift. In Proceedings of the 23rd annual conference computational learning theory (COLT), pp. 168–180.

[11]

Dietterich Thomas G. Steps Toward Robust Artificial Intelligence AI Magazine 2017 38 3 3-24

[12]

Dietterich Thomas G. Robust artificial intelligence and robust human organizations Frontiers of Computer Science 2018 13 1 1-3

[13]

Du, S. S., Koushik, J., Singh, A., & Póczos, B. (2017). Hypothesis transfer learning via transformation functions. In Advances in neural information processing systems 30 (NIPS), pp. 574–584.

[14]

Duan, L., Tsang, I. W., Xu, D., & Chua, T. (2009). Domain adaptation from multiple sources via auxiliary classifiers. In Proceedings of the 26th international conference on machine learning (ICML), pp. 289–296.

[15]

Elwell R and Polikar R Incremental learning of concept drift in nonstationary environments IEEE Transactions on Neural Networks 2011 22 10 1517-1531

[16]

Forman, G. (2006). Tackling concept drift by temporal inductive transfer. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR), pp. 252–259.

[17]

Gama J and Kosina P Recurrent concepts in data streams classification Knowledge and Information Systems 2014 40 3 489-507

[18]

Gama, J., Rocha, R., & Medas, P. (2003). Accurate decision trees for mining high-speed data streams. In Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery & data mining (KDD), pp. 523–528.

[19]

Gama J, Zliobaite I, Bifet A, Pechenizkiy M, and Bouchachia AA survey on concept drift adaptationACM Computing Surveys201446444:1-44:371305.68141

[20]

Gomes HM, Barddal JP, Enembreck F, and Bifet A A survey on ensemble learning for data stream classification ACM Computing Surveys 2017 50 2 23:1-23:36

[21]

Harel, M., Mannor, S., El-Yaniv, R., & Crammer, K. (2014). Concept drift detection through resampling. In Proceedings of the 31st international conference on machine learning (ICML), pp. 1009–1017.

[22]

Harries, M., & Wales, N. S. (1999). Splice-2 comparative evaluation: Electricity pricing. Technical Report of South Wales University.

[23]

Hazan E Introduction to online convex optimization Foundations and Trends in Optimization 2016 2 3–4 157-325

[24]

Helmbold DP and Long PMTracking drifting concepts by minimizing disagreementsMachine Learning199414127-450942.68667

[25]

Jaber, G., Cornuéjols, A., & Tarroux, P. (2013). A new on-line learning method for coping with recurring concepts: The ADACC system. In Proceedings of the 20th international conference on neural information processing (ICONIP), pp. 595–604.

[26]

Kakade SM, Shalev-Shwartz S, and Tewari ARegularization techniques for learning with matricesJournal of Machine Learning Research2012131865-1890295634506276168

[27]

Katakis I, Tsoumakas G, Banos E, Bassiliades N, and Vlahavas IP An adaptive personalized news dissemination system Journal of Intelligent Information Systems 2009 32 2 191-212

[28]

Katakis, I., Tsoumakas, G., & Vlahavas, I. P. (2008). An ensemble of classifiers for coping with recurring contexts in data streams. In Proceedings of the 18th European conference on artificial intelligence (ECAI), pp. 763–764.

[29]

Katakis I, Tsoumakas G, and Vlahavas IP Tracking recurring contexts using ensemble classifiers: An application to email filtering Knowledge and Information Systems 2010 22 3 371-391

[30]

Klinkenberg R Learning drifting concepts: Example selection vs. example weighting Intelligent Data Analysis 2004 8 3 281-300

[31]

Klinkenberg, R., & Joachims, T. (2000). Detecting concept drift with support vector machines. In Proceedings of the 17th international conference on machine learning (ICML), pp. 487–494.

[32]

Koltchinskii V Oracle inequalities in empirical risk minimization and sparse recovery problems 2011 Berlin Springer

[33]

Kolter, J. Z., & Maloof, M. A. (2003). Dynamic weighted majority: A new ensemble method for tracking concept drift. In Proceedings of the 3rd IEEE international conference on data mining (ICDM), pp. 123–130.

[34]

Kolter, J. Z., & Maloof, M. A. (2005). Using additive expert ensembles to cope with concept drift. In Proceedings of the 22rd international conference on machine learning (ICML), pp. 449–456.

[35]

Kolter JZ and Maloof MADynamic weighted majority: An ensemble method for drifting conceptsJournal of Machine Learning Research200782755-27901222.68237

[36]

Koolen, W. M., van Erven, T., & Grünwald, P. (2014). Learning the learning rate for prediction with expert advice. In Advances in neural information processing systems 27 (NIPS), pp. 2294–2302.

[37]

Koychev, I. (2000). Gradual forgetting for adaptation to concept drift. In Proceedings of ECAI 2000 workshop on current issues in spatio-temporal reasoning, pp. 101–106.

[38]

Kuncheva LI and Zliobaite I On the window size for classification in changing environments Intelligent Data Analysis 2009 13 6 861-872

[39]

Kuzborskij, I., & Orabona, F. (2013). Stability and hypothesis transfer learning. In Proceedings of the 30th international conference on machine learning (ICML), pp. 942–950.

[40]

Kuzborskij I and Orabona FFast rates by transferring from auxiliary hypothesesMachine Learning20171062171-195359687706737819

[41]

Ledoux M and Talagrand M Probability in banach spaces: Isoperimetry and processes 2013 Berlin Springer

[42]

Lei, Y., Dogan, Ü., Binder, A., & Kloft, M. (2015). Multi-class svms: From tighter data-dependent generalization bounds to novel algorithms. In Advances in neural information processing systems 28 (NIPS), pp. 2035–2043.

[43]

Li N, Tsang IW, and Zhou ZH Efficient optimization of performance measures by classifier adaptation IEEE Transactions on Pattern Analysis and Machine Intelligence 2013 35 6 1370-1382

[44]

Maurer, A. (2016). A vector-contraction inequality for rademacher complexities. In Proceedings of the 27th international conference on algorithmic learning theory (ALT), pp. 3–17.

[45]

Mohri, M., & Medina, A. M. (2012) New analysis and algorithm for learning with drifting distributions. In Proceedings of the 23rd international conference on algorithmic learning theory (ALT), pp. 124–138.

[46]

Mohri M, Rostamizadeh A, and Talwalkar A Foundations of machine learning 2012 Cambridge MIT press

[47]

Mohri M, Rostamizadeh A, and Talwalkar A Foundations of machine learning 2018 2 Cambridge The MIT Press

[48]

Rad, R. H., & Haeri, M. A. (2019). Hybrid forest: A concept drift aware data stream mining algorithm. CoRR arXiv:1902.03609.

[49]

Reddi, S. J., Póczos, B., & Smola, A. J. (2015). Doubly robust covariate shift correction. In Proceedings of the twenty-ninth AAAI conference on artificial intelligence (AAAI), pp. 2949–2955.

[50]

Schapire RE The strength of weak learnability Machine Learning 1990 5 197-227

[51]

Schapire RE and Freund Y Boosting: Foundations and algorithms 2012 Cambridge The MIT Press

[52]

Schlimmer JC and Granger RH Incremental learning from noisy data Machine Learning 1986 1 3 317-354

[53]

Schölkopf, B., Herbrich, R., & Smola, A. J. (2001). A generalized representer theorem. In Proceedings of the 14th annual conference computational learning theory (COLT), pp. 416–426.

[54]

Segev N, Harel M, Mannor S, Crammer K, and El-Yaniv R Learn on source, refine on target: A model transfer learning framework with random forests IEEE Transactions on Pattern Analysis and Machine Intelligence 2017 39 9 1811-1824

[55]

Shewchuk, J. R. (1994). An introduction to the conjugate gradient method without the agonizing pain. Carnegie Mellon University.

[56]

de Souza, V. M. A., Silva, D. F., Gama, J., & Batista, G. E. A. P. A. (2015). Data stream classification guided by clustering on nonstationary environments and extreme verification latency. In Proceedings of the 2015 SIAM international conference on data mining (SDM), pp. 873–881.

[57]

Srebro, N., Sridharan, K., & Tewari, A. (2010). Smoothness, low noise and fast rates. In Advances in neural information processing systems 23 (NIPS), pp. 2199–2207.

[58]

Sridharan, K., Shalev-Shwartz, S., & Srebro, N. (2008). Fast rates for regularized objectives. In Advances in neural information processing systems 21 (NIPS), pp. 1545–1552.

[59]

Street, W. N., & Kim, Y. (2001). A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery & data mining (KDD), pp. 377–382.

[60]

Sun Y, Tang K, Zhu Z, and Yao X Concept drift adaptation by exploiting historical knowledge IEEE Transactions on Neural Networks and Learning Systems 2018 29 10 4822-4832

[61]

Suykens JA, Van Gestel T, and De Brabanter J Least squares support vector machines 2002 Singapore World Scientific

[62]

Tommasi, T., Orabona, F., & Caputo, B. (2010). Safety in numbers: Learning categories from few examples with multi model knowledge transfer. In Proceedings of the 23rd IEEE conference on computer vision and pattern recognition (CVPR), pp. 3081–3088.

[63]

Tommasi T, Orabona F, and Caputo B Learning categories from few examples with multi model knowledge transfer IEEE Transactions on Pattern Analysis and Machine Intelligence 2014 36 5 928-941

[64]

Vergara A, Vembu S, Ayhan T, Ryan MA, Homer ML, and Huerta R Chemical gas sensor drift compensation using classifier ensembles Sensors and Actuators B: Chemical 2012 166 320-329

[65]

Wu, X. Z., Liu, S., & Zhou, Z. H. (2019). Heterogeneous model reuse via optimizing multiparty multiclass margin. In Proceedings of the 36th international conference on machine learning (ICML), pp. 6840–6849.

[66]

Yang, T., Li, Y. F., Mahdavi, M., Jin, R., & Zhou, Z. H. (2012). Nyström method vs random fourier features: A theoretical and empirical comparison. In Advances in neural information processing systems 25 (NIPS), pp. 476–484.

[67]

Ye, H. J., Zhan, D. C., Jiang, Y., & Zhou, Z. H. (2018). Rectify heterogeneous models with semantic mapping. In Proceedings of the 35th international conference on machine learning (ICML), pp. 1904–1913.

[68]

Zhang, L., Lu, S., & Zhou, Z. H. (2018). Adaptive online learning in dynamic environments. In Advances in neural information processing systems 31 (NeurIPS), pp. 1330–1340.

[69]

Zhao, P., Wang, X., Xie, S., Guo, L., & Zhou, Z.-H. (2019). Distribution-free one-pass learning. IEEE Transaction on Knowledge and Data Engineering. 10.1109/TKDE.2019.2937078.

[70]

Zhou ZH Ensemble methods: Foundations and algorithms 2012 London Chapman & Hall/CRC Press

[71]

Zhou Zhi-Hua Learnware: on the future of machine learning Frontiers of Computer Science 2016 10 4 589-590

[72]

Zhou, Z.-H. (2019). Abductive learning: Towards bridging machine learning and logical reasoning. Science China Information Sciences, 62(7), 76101:1–76101:3.

[73]

Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (ICML), pp. 928–936.

[74]

Zliobaite I Combining similarity in time and space for training set formation under concept drift Intelligent Data Analysis 2011 15 4 589-611

Cited By

Qian YZhang ZZhao PZhou Z(2024)Learning with Asynchronous LabelsACM Transactions on Knowledge Discovery from Data10.1145/366218618:8(1-27)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.1145/3662186
Tan ZLiu JBi XTan PZheng QLiu HXie YZou XYu YZhou ZBaeza-Yates RBonchi F(2024)Beimingwu: A Learnware Dock SystemProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671617(5773-5782)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671617
Tan PTan ZJiang YZhou Z(2024)Towards enabling learnware to handle heterogeneous feature spacesMachine Language10.1007/s10994-022-06245-1113:4(1839-1860)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10994-022-06245-1
Show More Cited By

Recommendations

Concepts Seeds Gathering and Dataset Updating Algorithm for Handling Concept Drift

In data mining, the phenomenon of change in data distribution over time is known as concept drift. In this research, the authors introduce a new approach called Concepts Seeds Gathering and Dataset Updating algorithm CSG-DU that gives the traditional ...
Detecting group concept drift from multiple data streams
Highlights
- Proposing a new type of concept drift group concept drift that commonly exists in multiple data streams.
Abstract
Concept drift may lead to a sharp downturn in the performance of streaming in data-based algorithms, caused by unforeseeable changes in the underlying distribution of data. In this paper, we are mainly concerned with concept drift ...
Concept drift handling: A domain adaptation perspective
Abstract
Data stream prediction is challenging when concepts drift, processing time, and memory constraints come into account. Concept drift refers to changes in data distribution over time that reduces prediction systems’ accuracy. We present a method ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Machine Language

Machine Language Volume 109, Issue 3

Mar 2020

172 pages

ISSN:0885-6125

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 March 2020

Accepted: 06 September 2019

Revision received: 16 July 2019

Received: 02 May 2019

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qian YZhang ZZhao PZhou Z(2024)Learning with Asynchronous LabelsACM Transactions on Knowledge Discovery from Data10.1145/366218618:8(1-27)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.1145/3662186
Tan ZLiu JBi XTan PZheng QLiu HXie YZou XYu YZhou ZBaeza-Yates RBonchi F(2024)Beimingwu: A Learnware Dock SystemProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671617(5773-5782)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671617
Tan PTan ZJiang YZhou Z(2024)Towards enabling learnware to handle heterogeneous feature spacesMachine Language10.1007/s10994-022-06245-1113:4(1839-1860)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10994-022-06245-1
Tan PTan ZJiang YZhou ZElkind E(2023)Handling learnwares developed from heterogeneous feature spaces without auxiliary dataProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/471(4235-4243)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/471
Jung SFurumoto KTakahashi TShiraishi YBlanc GTakahashi TZhang Z(2023)Model Selection for Continuous Operation of Automated Vulnerability Assessment SystemProceedings of the 2023 Workshop on Recent Advances in Resilient and Trustworthy ML Systems in Autonomous Networks10.1145/3605772.3624006(11-15)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3605772.3624006
Cai ZJiang RYang XWang ZGuo DKobayashi HSong XShibasaki RFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)MemDA: Forecasting Urban Time Series with Memory-based Drift AdaptationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614962(193-202)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614962
Read J(2023)From multi-label learning to cross-domain transfer: a model-agnostic approachApplied Intelligence10.1007/s10489-023-04841-953:21(25135-25153)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1007/s10489-023-04841-9
Yu FLi SYu W(2023)Graph-Guided Latent Variable Target Inference for Mitigating Concept Drift in Time Series ForecastingPRICAI 2023: Trends in Artificial Intelligence10.1007/978-981-99-7025-4_31(358-369)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1007/978-981-99-7025-4_31
Bai YZhang YZhao PSugiyama MZhou ZKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Adapting to online label shift with provable guaranteesProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602442(29960-29974)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602442
Bilaj SDhouib SMaghsudi S(2022)Hypothesis Transfer in Bandits by Weighted ModelsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-031-26412-2_18(284-299)Online publication date: 19-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-26412-2_18
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents