Abstract
The anomaly/novelty detection in time series data analysis is one of the most admired area of research, which is specifically a one-class classification problem. Concerning univariate time series data, apart from a huge number of training samples the higher sample span (observation length) also adds computation overhead along with its intrinsic issue of curse of dimensionality (sample length is considered as dimension). In this context, the present research proposes a concurrent way of sample span (treated as dimension) and training sample reduction approach for univariate time series data under the supervision of target class samples. Data representation of time series decides the performance of any machine learning approach, therefore the present research utilizes dissimilarity-based representation (DBR) techniques for time series data representation and later to reduce the sample length, a knowledge grid is computed via eigen space analysis of variance-covariance of target class samples. This knowledge grid is further used to transform the original sample length to reduced one. Afterwards, the training samples are selected using prototype methods. For experiments 16 different DBR measures are used along with 11 prototype techniques. Finally, one-class support vector machine (OCSVM) and 1-nearest neighbour (1-NN) are utilized for classification to validate the performance of proposed approach over 85 UCR/UEA univariate datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alam, S., Sonbhadra, S.K., Agarwal, S., Nagabhushan, P.: One-class support vector classifiers: a survey. Knowl.-Based Syst. 196, 105754 (2020)
Alam, S., Sonbhadra, S.K., Agarwal, S., Nagabhushan, P., Tanveer, M.: Sample reduction using farthest boundary point estimation (FBPE) for support vector data description (SVDD). Pattern Recognit. Lett. 131, 268–276 (2020)
Badhiye, S.S., Chatur, P.: A review on time series dimensionality reduction. HELIX 8(5), 3957–3960 (2018)
Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.: The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Discov. 31(3), 606–660 (2016). https://doi.org/10.1007/s10618-016-0483-9
Cassisi, C., Montalto, P., Aliotta, M., Cannata, A., Pulvirenti, A.: Similarity measures and dimensionality reduction techniques for time series data mining. In: Advances in Data Mining Knowledge Discovery and Applications, pp. 71–96 (2012)
Chalapathy, R., Chawla, S.: Deep learning for anomaly detection: a survey. arXiv preprint arXiv:1901.03407 (2019)
Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 491–502 (2005)
Costa, Y.M.G., Bertolini, D., Britto, A.S., Cavalcanti, G.D.C., Oliveira, L.E.S.: The dissimilarity approach: a review. Artif. Intell. Rev. 53(4), 2783–2808 (2019). https://doi.org/10.1007/s10462-019-09746-z
Dau, H.A., et al.: The UCR time series archive. IEEE/CAA J. Automatica Sinica 6(6), 1293–1305 (2019)
De Amorim, R.C., Mirkin, B.: Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering. Pattern Recognit. 45(3), 1061–1075 (2012)
Duin, R.P., Pękalska, E.: The dissimilarity representation for pattern recognition: a tutorial. Tech. rep., Technical Report (2009)
Duin, R.P., Roli, F., de Ridder, D.: A note on core research issues for statistical pattern recognition. Pattern Recognit. Lett. 23(4), 493–499 (2002)
Esling, P., Agon, C.: Time-series data mining. ACM Comput. Surv. (CSUR) 45(1), 1–34 (2012)
Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
Geun Kim, M.: Multivariate outliers and decompositions of Mahalanobis distance. Commun. Stat.-Theory Methods 29(7), 1511–1526 (2000)
Giusti, R., Batista, G.: An empirical comparison of dissimilarity measures for time series classification, pp. 82–88 (October 2013). https://doi.org/10.1109/BRACIS.2013.22
Hoi, S.C., Sahoo, D., Lu, J., Zhao, P.: Online learning: a comprehensive survey. arXiv preprint arXiv:1802.02871 (2018)
Jiang, G., Wang, W., Zhang, W.: A novel distance measure for time series: maximum shifting correlation distance. Pattern Recognit. Lett. 117, 58–65 (2019)
Khan, S.S., Madden, M.G.: One-class classification: taxonomy of study and review of techniques. Knowl. Eng. Rev. 29(3), 345–374 (2014)
Kuncheva, L.I., Bezdek, J.C.: Nearest prototype classification: clustering, genetic algorithms, or random search? IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 28(1), 160–164 (1998)
Lin, J., Williamson, S., Borne, K., DeBarr, D.: Pattern recognition in time series. Adv. Mach. Learn. Data Min. Astron. 1(617–645), 3 (2012)
Mauceri, S., Sweeney, J., McDermott, J.: Dissimilarity-based representations for one-class classification on time series. Pattern Recognit. 100, 107122 (2020)
Mazhelis, O.: One-class classifiers: a review and analysis of suitability in the context of mobile-masquerader detection. S. Afr. Comput. J. 2006(36), 29–48 (2006)
Mori, U., Mendiburu, A., Lozano, J.A.: Distance measures for time series in R: The TSdist package. R J. 8(2), 451 (2016)
Nakano, K., Chakraborty, B.: Effect of data representation for time series classification–a comparative study and a new proposal. Mach. Learn. Knowl. Extr. 1(4), 1100–1120 (2019)
Pękalska, E., Duin, R.P., Paclík, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recognit. 39(2), 189–208 (2006)
Peng, K., Leung, V.C., Huang, Q.: Clustering approach based on mini batch kmeans for intrusion detection system over big data. IEEE Access 6, 11897–11906 (2018)
Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014)
Rakthanmanon, T., et al.: Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 262–270 (2012)
Rodríguez, C.E., Núñez-Antonio, G., Escarela, G.: A Bayesian mixture model for clustering circular data. Comput. Stat. Data Anal. 143, 106842 (2020)
Serra, J., Arcos, J.L.: An empirical evaluation of similarity measures for time series classification. Knowl.-Based Syst. 67, 305–314 (2014)
Sharma, A., Kumar, A., Pandey, A.K., Singh, R.: Time series data representation and dimensionality reduction techniques. In: Johri, P., Verma, J.K., Paul, S. (eds.) Applications of Machine Learning. AIS, pp. 267–284. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3357-0_18
Sonbhadra, S.K., Agarwal, S., Nagabhushan, P.: Early-stage covid-19 diagnosis in presence of limited posteroanterior chest x-ray images via novel pinball-OCSVM. arXiv preprint arXiv:2010.08115 (2020)
Sonbhadra, S.K., Agarwal, S., Nagabhushan, P.: Target specific mining of covid-19 scholarly articles using one-class approach. Chaos Solitons Fractals 140, 110155 (2020)
Stefan, A., Athitsos, V., Das, G.: The move-split-merge metric for time series. IEEE Trans. Knowl. Data Eng. 25(6), 1425–1438 (2012)
Triguero, I., Derrac, J., Garcia, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(1), 86–100 (2011)
Verleysen, M., François, D.: The curse of dimensionality in data mining and time series prediction. In: Cabestany, J., Prieto, A., Sandoval, F. (eds.) IWANN 2005. LNCS, vol. 3512, pp. 758–770. Springer, Heidelberg (2005). https://doi.org/10.1007/11494669_93
Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 26(2), 275–309 (2013)
Wilson, S.J.: Data representation for time series data mining: time domain approaches. Wiley Interdiscip. Rev.: Comput. Stat. 9(1), e1392 (2017)
Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(04), 597–604 (2006)
Yin, C., Zhang, S., Wang, J., Xiong, N.N.: Anomaly detection based on convolutional recurrent autoencoder for IoT time series. IEEE Trans. Syst. Man Cybern.: Syst. (2020)
Zhang, K., Gu, X.: An affinity propagation clustering algorithm for mixed numeric and categorical datasets. Math. Probl. Eng. 2014, 1–8 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sonbhadra, S.K., Agarwal, S., Nagabhushan, P. (2021). Target Class Supervised Sample Length and Training Sample Reduction of Univariate Time Series. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. From Theory to Practice. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12799. Springer, Cham. https://doi.org/10.1007/978-3-030-79463-7_51
Download citation
DOI: https://doi.org/10.1007/978-3-030-79463-7_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79462-0
Online ISBN: 978-3-030-79463-7
eBook Packages: Computer ScienceComputer Science (R0)