[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Target-class guided sample length reduction and training set selection of univariate time-series

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The novelty/anomaly detection in time-series (TS) data is an admired research domain, which is specifically a one-class classification (OCC) task, where only target-class samples are present during training and the samples from other classes are unavailable. The performance of OCC algorithms depends on quality and quantity of features and training samples because all the features/samples are not equally important for target-class representation. The present research focuses on OCC of univariate time-series (UTS) and proposes a novel way to acquire the knowledge of the target-class to ensure its strong separation from the other class samples. Apart from enormous training samples, the large sample length (span) increases the computing complexities together with its innate problem of curse of “dimensionality” (here, sample span is treated as dimension of time-series). In this context, the present article offers a concurrent way of target-class guided sample span reduction and training sample selection for UTS data. Initially, the vector representation is obtained using state-of-the-art dissimilarity-based representation (DBR) techniques and later, a novel target-class supervised sample span reduction algorithm is offered via Eigenspace analysis to obtain the minimal sample span. Furthermore, to select the most promising training samples as target-class representatives, state-of-the-art prototype methods are utilized. Finally, one-class support vector machine (OCSVM), 1-nearest neighbour (1-NN) and isolation forest (IF) are utilized to evaluate the performance of proposed approach. Intensive experiments are performed over the archive of 85 univariate datasets provided by University of California, Riverside (UCR) and University of East Anglia (UEA) (this repository is also known as UCR/UEA archive).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Cassisi C, Montalto P, Aliotta M, Cannata A, Pulvirenti A (2012) Similarity measures and dimensionality reduction techniques for time series data mining. Advances in data mining knowledge discovery and applications, pp 71–96

  2. Lin J, Williamson S, Borne K, DeBarr D (2012) Pattern recognition in time series. Advances in Machine Learning and Data Mining for Astronomy 1(617–645):3

    Google Scholar 

  3. Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Decis Mak 5(04):597–604

    Article  Google Scholar 

  4. Esling P, Agon C (2012) Time-series data mining. ACM Computing Surveys (CSUR) 45 (1):1–34

    Article  MATH  Google Scholar 

  5. Wilson SJ (2017) Data representation for time series data mining: time domain approaches. Wiley Interdiscip Rev Comput Stat 9(1):e1392

    Article  MathSciNet  Google Scholar 

  6. Duin RP, Roli F, De Ridder D (2002) A note on core research issues for statistical pattern recognition. Pattern Recognit Lett 23(4):493–499

    Article  MATH  Google Scholar 

  7. Duin RP, Pkalska E (2009) The dissimilarity representation for pattern recognition: a tutorial. Tech. rep., Technical Report

  8. Hoi SC, Sahoo D, Lu J, Zhao P (2018) Online learning: a comprehensive survey. arXiv:1802.02871

  9. Verleysen M, François D. (2005) The curse of dimensionality in data mining and time series prediction. In: International work-conference on artificial neural networks, pp 758–770. Springer

  10. Sonbhadra SK, Agarwal S, Nagabhushan P (2021) Target class supervised sample length and training sample reduction of univariate time series. In: International conference on industrial, engineering and other applications of applied intelligent systems, pp 603–614. Springer

  11. Pkalska E, Duin RP, Paclík P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recogn 39(2):189–208

    Article  MATH  Google Scholar 

  12. Xing Z, Pei J, Philip SY (2012) Early classification on time series. Knowl Inf Syst 31 (1):105–127

    Article  Google Scholar 

  13. Wang H, Zhang Q, Wu J, Pan S, Chen Y (2019) Time series feature learning with labeled and unlabeled data. Pattern Recogn 89:55–66

    Article  Google Scholar 

  14. Alam S, Sonbhadra SK, Agarwal S, Nagabhushan P (2020) One-class support vector classifiers: a survey. Knowl-Based Syst pp 105754

  15. Alam S, Sonbhadra SK, Agarwal S, Nagabhushan P, Tanveer M (2020) Sample reduction using farthest boundary point estimation (fbpe) for support vector data description (svdd). Pattern Recogn Lett 131:268–276

    Article  Google Scholar 

  16. Sonbhadra SK, Agarwal S, Nagabhushan P (2021) Learning target class feature subspace (ltc-fs) using eigenspace analysis and n-ary search-based autonomous hyperparameter tuning for ocsvm. Int J Pattern Recognit Artif Intell:2151015

  17. Mauceri S, Sweeney J, McDermott J (2020) Dissimilarity-based representations for one-class classification on time series. Pattern Recogn 100:107122

    Article  Google Scholar 

  18. Nakano K, Chakraborty B (2019) Effect of data representation for time series classification—a comparative study and a new proposal. Machine Learning and Knowledge Extraction 1(4):1100–1120

    Article  Google Scholar 

  19. Costa YM, Bertolini D, Britto AS, Cavalcanti GD, Oliveira LE (2019) The dissimilarity approach: a review. Artif Intell Rev. pp 1–26

  20. Serra J, Arcos JL (2014) An empirical evaluation of similarity measures for time series classification. Knowl-Based Syst 67:305–314

    Article  Google Scholar 

  21. Giusti R, Batista G (2013) An empirical comparison of dissimilarity measures for time series classification, pp 82–88. https://doi.org/10.1109/BRACIS.2013.22

  22. Huang X, Wu L, Ye Y (2019) A review on dimensionality reduction techniques. Int J Pattern Recognit Artif Intell 33(10):1950017

    Article  Google Scholar 

  23. Badhiye SS, Chatur P (2018) A review on time series dimensionality reduction. HELIX 8 (5):3957–3960

    Article  Google Scholar 

  24. Fu TC (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181

    Article  Google Scholar 

  25. Bien J, Tibshirani R (2011) Prototype selection for interpretable classification. Ann Appl Stat 5(4):2403–2424

    Article  MathSciNet  MATH  Google Scholar 

  26. Minter T (1975) Single-class classification. In: LARS symposia, pp 54

  27. Koch MW, Moya MM, Hostetler LD, Fogler RJ (1995) Cueing, feature discovery, and one-class learning for synthetic aperture radar automatic target recognition. Neural Netw 8(7–8):1081–1102

    Article  Google Scholar 

  28. Ritter G, Gallegos MT (1997) Outliers in statistical pattern recognition and an application to automatic chromosome classification. Pattern Recogn Lett 18(6):525–539

    Article  Google Scholar 

  29. Bishop CM (1994) Novelty detection and neural network validation. IEE Proceedings-Vision Image and Signal processing 141(4):217–222

    Article  Google Scholar 

  30. Japkowicz N (1999) Concept-learning in the absence of counter-examples: an autoassociation-based approach to classification. Rutgers University

  31. Mazhelis O (2006) One-class classifiers: a review and analysis of suitability in the context of mobile-masquerader detection. S Afr Comput J 2006(36):29–48

    Google Scholar 

  32. Chalapathy R, Chawla S (2019)

  33. Pimentel MA, Clifton DA, Clifton L, Tarassenko L (2014) A review of novelty detection. Signal Process 99:215–249

    Article  Google Scholar 

  34. Khan SS, Madden MG (2014) One-class classification: taxonomy of study and review of techniques. Knowl Eng Rev 29(3):345–374

    Article  Google Scholar 

  35. Sonbhadra SK, Agarwal S, Nagabhushan P (2020) Early-stage covid-19 diagnosis in presence of limited posteroanterior chest x-ray images via novel pinball-ocsvm. arXiv:2010.08115

  36. Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. SIGKDD Explor Newsl 12(1):40–48. https://doi.org/10.1145/1882471.1882478

    Article  Google Scholar 

  37. Lines J, Taylor S, Bagnall A (2018) Time series classification with hive-cote: the hierarchical vote collective of transformation-based ensembles. ACM Transactions on Knowledge Discovery from Data 12(5)

  38. Lines J, Bagnall A (2015) Time series classification with ensembles of elastic distance measures. Data Min Knowl Disc 29(3):565–592

    Article  MathSciNet  MATH  Google Scholar 

  39. Yin C, Zhang S, Wang J, Xiong NN (2020) Anomaly detection based on convolutional recurrent autoencoder for iot time series. IEEE Transactions on Systems, Man and cybernetics: Systems

  40. Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Disc 26(2):275–309

    Article  MathSciNet  Google Scholar 

  41. Batista GE, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of the 2011 SIAM international conference on data mining. SIAM, pp 699–710

  42. Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp 491–502

  43. Stefan A, Athitsos V, Das G (2012) The move-split-merge metric for time series. IEEE Trans Knowl Data Eng 25(6):1425–1438

    Article  Google Scholar 

  44. Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 262–270

  45. Peña D, Galeano P (2001) Multivariate analysis in vector time series. Des-Working Papers. Statistics And Econometrics Ws

  46. Längkvist M, Karlsson L, Loutfi A (2014) A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recogn Lett 42:11–24

    Article  Google Scholar 

  47. Kakizawa Y, Shumway RH, Taniguchi M (1998) Discrimination and clustering for multivariate time series. J Am Stat Assoc 93(441):328–340

    Article  MathSciNet  MATH  Google Scholar 

  48. Villani C (2003) Topics in optimal transportation. 58 American Mathematical Soc

  49. Jiang G, Wang W, Zhang W (2019) A novel distance measure for time series: maximum shifting correlation distance. Pattern Recogn Lett 117:58–65

    Article  Google Scholar 

  50. De Amorim RC, Mirkin B (2012) Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering. Pattern Recogn 45(3):1061–1075

    Article  Google Scholar 

  51. Mori U, Mendiburu A, Lozano JA (2016) Distance measures for time series in r: the tsdist package. R J 8(2):451

    Article  Google Scholar 

  52. Geun Kim M (2000) Multivariate outliers and decompositions of mahalanobis distance. Commun Stat - Theory Methods 29(7):1511–1526

    Article  MathSciNet  MATH  Google Scholar 

  53. Kuncheva LI, Bezdek JC (1998) Nearest prototype classification: Clustering, genetic algorithms, or random search? IEEE Transactions on Systems, Man, and Cybernetics. Part C (Applications and Reviews) 28 (1):160–164

    Google Scholar 

  54. Triguero I, Derrac J, Garcia S, Herrera F (2011) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Transactions on Systems, Man, and Cybernetics. Part C (Applications and Reviews) 42(1):86–100

    Google Scholar 

  55. Garcia S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE transactions on pattern analysis and machine intelligence 34 (3):417–435

    Article  Google Scholar 

  56. Rodríguez CE, Núñez-Antonio G , Escarela G (2020) A bayesian mixture model for clustering circular data. Computational Statistics & Data Analysis 106842:143

    MathSciNet  MATH  Google Scholar 

  57. Zhang K, Gu X (2014) An affinity propagation clustering algorithm for mixed numeric and categorical datasets. Math Probl Eng, vol 2014

  58. Peng K, Leung VC, Huang Q (2018) Clustering approach based on mini batch kmeans for intrusion detection system over big data. IEEE Access 6:11897–11906

    Article  Google Scholar 

  59. Dau HA, Bagnall A, Kamgar K, Yeh CCM, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh E (2019) The ucr time series archive. IEEE/CAA Journal of Automatica Sinica 6(6):1293–1305

    Article  Google Scholar 

  60. Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Disc 31 (3):606–660

    Article  MathSciNet  Google Scholar 

  61. Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems, vol 24. Curran Associates Inc, pp 2546–2554

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanjay Kumar Sonbhadra.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sonbhadra, S.K., Agarwal, S. & Nagabhushan, P. Target-class guided sample length reduction and training set selection of univariate time-series. Appl Intell 53, 7056–7073 (2023). https://doi.org/10.1007/s10489-022-03761-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03761-4

Keywords

Navigation