Abstract
The novelty/anomaly detection in time-series (TS) data is an admired research domain, which is specifically a one-class classification (OCC) task, where only target-class samples are present during training and the samples from other classes are unavailable. The performance of OCC algorithms depends on quality and quantity of features and training samples because all the features/samples are not equally important for target-class representation. The present research focuses on OCC of univariate time-series (UTS) and proposes a novel way to acquire the knowledge of the target-class to ensure its strong separation from the other class samples. Apart from enormous training samples, the large sample length (span) increases the computing complexities together with its innate problem of curse of “dimensionality” (here, sample span is treated as dimension of time-series). In this context, the present article offers a concurrent way of target-class guided sample span reduction and training sample selection for UTS data. Initially, the vector representation is obtained using state-of-the-art dissimilarity-based representation (DBR) techniques and later, a novel target-class supervised sample span reduction algorithm is offered via Eigenspace analysis to obtain the minimal sample span. Furthermore, to select the most promising training samples as target-class representatives, state-of-the-art prototype methods are utilized. Finally, one-class support vector machine (OCSVM), 1-nearest neighbour (1-NN) and isolation forest (IF) are utilized to evaluate the performance of proposed approach. Intensive experiments are performed over the archive of 85 univariate datasets provided by University of California, Riverside (UCR) and University of East Anglia (UEA) (this repository is also known as UCR/UEA archive).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Cassisi C, Montalto P, Aliotta M, Cannata A, Pulvirenti A (2012) Similarity measures and dimensionality reduction techniques for time series data mining. Advances in data mining knowledge discovery and applications, pp 71–96
Lin J, Williamson S, Borne K, DeBarr D (2012) Pattern recognition in time series. Advances in Machine Learning and Data Mining for Astronomy 1(617–645):3
Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Decis Mak 5(04):597–604
Esling P, Agon C (2012) Time-series data mining. ACM Computing Surveys (CSUR) 45 (1):1–34
Wilson SJ (2017) Data representation for time series data mining: time domain approaches. Wiley Interdiscip Rev Comput Stat 9(1):e1392
Duin RP, Roli F, De Ridder D (2002) A note on core research issues for statistical pattern recognition. Pattern Recognit Lett 23(4):493–499
Duin RP, Pkalska E (2009) The dissimilarity representation for pattern recognition: a tutorial. Tech. rep., Technical Report
Hoi SC, Sahoo D, Lu J, Zhao P (2018) Online learning: a comprehensive survey. arXiv:1802.02871
Verleysen M, François D. (2005) The curse of dimensionality in data mining and time series prediction. In: International work-conference on artificial neural networks, pp 758–770. Springer
Sonbhadra SK, Agarwal S, Nagabhushan P (2021) Target class supervised sample length and training sample reduction of univariate time series. In: International conference on industrial, engineering and other applications of applied intelligent systems, pp 603–614. Springer
Pkalska E, Duin RP, Paclík P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recogn 39(2):189–208
Xing Z, Pei J, Philip SY (2012) Early classification on time series. Knowl Inf Syst 31 (1):105–127
Wang H, Zhang Q, Wu J, Pan S, Chen Y (2019) Time series feature learning with labeled and unlabeled data. Pattern Recogn 89:55–66
Alam S, Sonbhadra SK, Agarwal S, Nagabhushan P (2020) One-class support vector classifiers: a survey. Knowl-Based Syst pp 105754
Alam S, Sonbhadra SK, Agarwal S, Nagabhushan P, Tanveer M (2020) Sample reduction using farthest boundary point estimation (fbpe) for support vector data description (svdd). Pattern Recogn Lett 131:268–276
Sonbhadra SK, Agarwal S, Nagabhushan P (2021) Learning target class feature subspace (ltc-fs) using eigenspace analysis and n-ary search-based autonomous hyperparameter tuning for ocsvm. Int J Pattern Recognit Artif Intell:2151015
Mauceri S, Sweeney J, McDermott J (2020) Dissimilarity-based representations for one-class classification on time series. Pattern Recogn 100:107122
Nakano K, Chakraborty B (2019) Effect of data representation for time series classification—a comparative study and a new proposal. Machine Learning and Knowledge Extraction 1(4):1100–1120
Costa YM, Bertolini D, Britto AS, Cavalcanti GD, Oliveira LE (2019) The dissimilarity approach: a review. Artif Intell Rev. pp 1–26
Serra J, Arcos JL (2014) An empirical evaluation of similarity measures for time series classification. Knowl-Based Syst 67:305–314
Giusti R, Batista G (2013) An empirical comparison of dissimilarity measures for time series classification, pp 82–88. https://doi.org/10.1109/BRACIS.2013.22
Huang X, Wu L, Ye Y (2019) A review on dimensionality reduction techniques. Int J Pattern Recognit Artif Intell 33(10):1950017
Badhiye SS, Chatur P (2018) A review on time series dimensionality reduction. HELIX 8 (5):3957–3960
Fu TC (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181
Bien J, Tibshirani R (2011) Prototype selection for interpretable classification. Ann Appl Stat 5(4):2403–2424
Minter T (1975) Single-class classification. In: LARS symposia, pp 54
Koch MW, Moya MM, Hostetler LD, Fogler RJ (1995) Cueing, feature discovery, and one-class learning for synthetic aperture radar automatic target recognition. Neural Netw 8(7–8):1081–1102
Ritter G, Gallegos MT (1997) Outliers in statistical pattern recognition and an application to automatic chromosome classification. Pattern Recogn Lett 18(6):525–539
Bishop CM (1994) Novelty detection and neural network validation. IEE Proceedings-Vision Image and Signal processing 141(4):217–222
Japkowicz N (1999) Concept-learning in the absence of counter-examples: an autoassociation-based approach to classification. Rutgers University
Mazhelis O (2006) One-class classifiers: a review and analysis of suitability in the context of mobile-masquerader detection. S Afr Comput J 2006(36):29–48
Chalapathy R, Chawla S (2019)
Pimentel MA, Clifton DA, Clifton L, Tarassenko L (2014) A review of novelty detection. Signal Process 99:215–249
Khan SS, Madden MG (2014) One-class classification: taxonomy of study and review of techniques. Knowl Eng Rev 29(3):345–374
Sonbhadra SK, Agarwal S, Nagabhushan P (2020) Early-stage covid-19 diagnosis in presence of limited posteroanterior chest x-ray images via novel pinball-ocsvm. arXiv:2010.08115
Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. SIGKDD Explor Newsl 12(1):40–48. https://doi.org/10.1145/1882471.1882478
Lines J, Taylor S, Bagnall A (2018) Time series classification with hive-cote: the hierarchical vote collective of transformation-based ensembles. ACM Transactions on Knowledge Discovery from Data 12(5)
Lines J, Bagnall A (2015) Time series classification with ensembles of elastic distance measures. Data Min Knowl Disc 29(3):565–592
Yin C, Zhang S, Wang J, Xiong NN (2020) Anomaly detection based on convolutional recurrent autoencoder for iot time series. IEEE Transactions on Systems, Man and cybernetics: Systems
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Disc 26(2):275–309
Batista GE, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of the 2011 SIAM international conference on data mining. SIAM, pp 699–710
Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp 491–502
Stefan A, Athitsos V, Das G (2012) The move-split-merge metric for time series. IEEE Trans Knowl Data Eng 25(6):1425–1438
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 262–270
Peña D, Galeano P (2001) Multivariate analysis in vector time series. Des-Working Papers. Statistics And Econometrics Ws
Längkvist M, Karlsson L, Loutfi A (2014) A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recogn Lett 42:11–24
Kakizawa Y, Shumway RH, Taniguchi M (1998) Discrimination and clustering for multivariate time series. J Am Stat Assoc 93(441):328–340
Villani C (2003) Topics in optimal transportation. 58 American Mathematical Soc
Jiang G, Wang W, Zhang W (2019) A novel distance measure for time series: maximum shifting correlation distance. Pattern Recogn Lett 117:58–65
De Amorim RC, Mirkin B (2012) Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering. Pattern Recogn 45(3):1061–1075
Mori U, Mendiburu A, Lozano JA (2016) Distance measures for time series in r: the tsdist package. R J 8(2):451
Geun Kim M (2000) Multivariate outliers and decompositions of mahalanobis distance. Commun Stat - Theory Methods 29(7):1511–1526
Kuncheva LI, Bezdek JC (1998) Nearest prototype classification: Clustering, genetic algorithms, or random search? IEEE Transactions on Systems, Man, and Cybernetics. Part C (Applications and Reviews) 28 (1):160–164
Triguero I, Derrac J, Garcia S, Herrera F (2011) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Transactions on Systems, Man, and Cybernetics. Part C (Applications and Reviews) 42(1):86–100
Garcia S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE transactions on pattern analysis and machine intelligence 34 (3):417–435
Rodríguez CE, Núñez-Antonio G , Escarela G (2020) A bayesian mixture model for clustering circular data. Computational Statistics & Data Analysis 106842:143
Zhang K, Gu X (2014) An affinity propagation clustering algorithm for mixed numeric and categorical datasets. Math Probl Eng, vol 2014
Peng K, Leung VC, Huang Q (2018) Clustering approach based on mini batch kmeans for intrusion detection system over big data. IEEE Access 6:11897–11906
Dau HA, Bagnall A, Kamgar K, Yeh CCM, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh E (2019) The ucr time series archive. IEEE/CAA Journal of Automatica Sinica 6(6):1293–1305
Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Disc 31 (3):606–660
Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems, vol 24. Curran Associates Inc, pp 2546–2554
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sonbhadra, S.K., Agarwal, S. & Nagabhushan, P. Target-class guided sample length reduction and training set selection of univariate time-series. Appl Intell 53, 7056–7073 (2023). https://doi.org/10.1007/s10489-022-03761-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03761-4