Abstract
One of the challenges in machine learning, especially in the Big Data era, is to obtain labeled data sets. Indeed, the difficulty of labeling large amounts of data had lead to an increasing reliance on unsupervised classifiers, such as deep autoencoders. In this paper, we study the problem of involving a human expert in the training of a classifier instead of using labeled data. We use anomaly detection in network monitoring as a field of application. We demonstrate how using crude, already existing monitoring software as a heuristic to choose which points to label can boost the classification rate with respect to both the monitoring software and the classifier trained on a fully labeled data set, with a very low computational cost. We introduce the Artificial Immune Ecosystem meta-algorithm as a generic framework integrating the expert, the heuristic and the classifier.
The work presented here has been funded by IPLine SAS, by the French ANRT in the frame of CIFRE contract 2015/0079, and by the French Banque Publique d’Investissement (BPI) under program FUI-AAP-19 in the frame of the HuMa project.
Similar content being viewed by others
Notes
- 1.
Personal experience of the author as a software engineer for a Cloud service provider.
References
Silverstein, A.M.: Paul ehrlich, archives and the history of immunology. Nat. Immunol. 6(7), 639–639 (2005)
Forrest, S., Perelson, A.S., Allen, L., Cherukuri, R.: Self-nonself discrimination in a computer. In: Proceedings of the 1994 IEEE Symposium on Security and Privacy, p. 202. IEEE (1994)
Hofmeyr, S.A., Forrest, S.: An immunological model of distributed detection and its application to computer security. The University of New Mexico (1999)
Aickelin, U., Cayzer, S.: The danger theory and its application to artificial immune systems (2008). arXiv preprint arXiv:0801.3549
Freitas, A.A., Timmis, J.: Revisiting the foundations of artificial immune systems for data mining. IEEE Trans. Evol. Comput. 11(4), 521–540 (2007)
Montechiesi, L., Cocconcelli, M., Rubini, R.: Artificial immune system via euclidean distance minimization for anomaly detection in bearings. Mech. Syst. Signal Process. 76–77, 380–393 (2015)
Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 1033–1040. ACM (2006)
Hills, J., Lines, J., Baranauskas, E., Mapp, J., Bagnall, A.: Classification of time series by shapelet transformation. Data Min. Knowl. Disc. 28(4), 851–881 (2014)
Bagnall, A., Janacek, G.: A run length transformation for discriminating between auto regressive time series. J. Classif. 31(2), 154–178 (2014)
Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2–11. ACM (2003)
Wei, L., Kumar, N., Lolla, V.N., Keogh, E.J., Lonardi, S., Ratanamahatana, C.A.: Assumption-free anomaly detection in time series. In: SSDBM 2005, vol. 5, pp. 237–242 (2005)
Senin, P., Lin, J., Wang, X., Oates, T., Gandhi, S., Boedihardjo, A.P., Chen, C., Frankenstein, S.: Time series anomaly discovery with grammar-based compression. In: EDBT, pp. 481–492 (2015)
Freund, Y., Schapire, R.E.: A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995). doi:10.1007/3-540-59119-2_166
Babenko, B., Yang, M.H., Belongie, S.: A family of online boosting algorithms. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1346–1353. IEEE (2009)
Beygelzimer, A., Kale, S., Luo, H.: Optimal and adaptive algorithms for online boosting (2015). arXiv preprint arXiv:1502.02651
Volkova, S.: Data stream mining: A review of learning methods and frameworks (2012)
Chu, F., Zaniolo, C.: Fast and light boosting for adaptive mining of data streams. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 282–292. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24775-3_36
Chen, L., Kamel, M.S.: Design of multiple classifier systems for time series data. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 216–225. Springer, Heidelberg (2005). doi:10.1007/11494683_22
Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)
Valko, M., Kveton, B., Huang, L., Ting, D.: Online semi-supervised learning on quantized graphs (2012). arXiv preprint arXiv:1203.3522
Zhang, G., Jiang, Z., Davis, L.S.: Online semi-supervised discriminative dictionary learning for sparse representation. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 259–273. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37331-2_20
Kveton, B., Philipose, M., Valko, M., Huang, L.: Online semi-supervised perception: Real-time learning without explicit feedback. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 15–21. IEEE (2010)
Veeramachaneni, K., Arnaldo, I.: AI2: Training a big data machine to defend. In: 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), April 2016
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009)
Mi, Y.: Imbalanced classification based on active learning smote. Res. J. Appl. Sci. Eng. Technol. 5, 944–949 (2013)
Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30115-8_7
Saunier, N., Midenet, S., Grumbach, A.: Stream-based learning through data selection in a road safety application. In: STAIRS 2004, Proceedings of the Second Starting AI Researchers Symposium, vol. 109, pp. 107–117(2004)
Forman, G., Cohen, I.: Learning from little: Comparison of classifiers given little training. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 161–172. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30116-5_17
Chinchor, N., Sundheim, B.: MUC-5 evaluation metrics. In: Proceedings of the 5th Conference on Message Understanding, pp. 69–78. Association for Computational Linguistics (1993)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Guigou, F., Collet, P., Parrend, P. (2017). The Artificial Immune Ecosystem: A Bio-Inspired Meta-Algorithm for Boosting Time Series Anomaly Detection with Expert Input. In: Squillero, G., Sim, K. (eds) Applications of Evolutionary Computation. EvoApplications 2017. Lecture Notes in Computer Science(), vol 10199. Springer, Cham. https://doi.org/10.1007/978-3-319-55849-3_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-55849-3_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55848-6
Online ISBN: 978-3-319-55849-3
eBook Packages: Computer ScienceComputer Science (R0)