[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations

Published: 01 July 2019 Publication History

Abstract

The time series classification literature has expanded rapidly over the last decade, with many new classification approaches published each year. Prior research has mostly focused on improving the accuracy and efficiency of classifiers, with interpretability being somewhat neglected. This aspect of classifiers has become critical for many application domains and the introduction of the EU GDPR legislation in 2018 is likely to further emphasize the importance of interpretable learning algorithms. Currently, state-of-the-art classification accuracy is achieved with very complex models based on large ensembles (COTE) or deep neural networks (FCN). These approaches are not efficient with regard to either time or space, are difficult to interpret and cannot be applied to variable-length time series, requiring pre-processing of the original series to a set fixed-length. In this paper we propose new time series classification algorithms to address these gaps. Our approach is based on symbolic representations of time series, efficient sequence mining algorithms and linear classification models. Our linear models are as accurate as deep learning models but are more efficient regarding running time and memory, can work with variable-length time series and can be interpreted by highlighting the discriminative symbolic features on the original time series. We advance the state-of-the-art in time series classification by proposing new algorithms built using the following three key ideas: (1) Multiple resolutions of symbolic representations: we combine symbolic representations obtained using different parameters, rather than one fixed representation (e.g., multiple SAX representations); (2) Multiple domain representations: we combine symbolic representations in time (e.g., SAX) and frequency (e.g., SFA) domains, to be more robust across problem types; (3) Efficient navigation in a huge symbolic-words space: we extend a symbolic sequence classifier (SEQL) to work with multiple symbolic representations and use its greedy feature selection strategy to effectively filter the best features for each representation. We show that our multi-resolution multi-domain linear classifier (mtSS-SEQL+LR) achieves a similar accuracy to the state-of-the-art COTE ensemble, and to recent deep learning methods (FCN, ResNet), but uses a fraction of the time and memory required by either COTE or deep models. To further analyse the interpretability of our classifier, we present a case study on a human motion dataset collected by the authors. We discuss the accuracy, efficiency and interpretability of our proposed algorithms and release all the results, source code and data to encourage reproducibility.

References

[1]
Bagnall A, Lines J, Hills J, Bostrom A (2015) Time-series classification with cote: the collective of transformation-based ensembles. IEEE Trans Knowl Data Eng 27(9):2522-2535.
[2]
Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31(3):606-660.
[3]
Baydogan MG, Runger G, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Anal Mach Intell 35(11):2796-2802.
[4]
Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(5):1-10.
[5]
Bostrom A, Bagnall A (2015) Binary shapelet transform for multiclass time series classification. In: Madria S, Hara T (eds) Big data analytics and knowledge discovery. Springer International Publishing, Cham, pp 257-269.
[6]
Briandet R, Kemsley EK, Wilson RH (1996) Discrimination of arabica and robusta in instant coffee by fourier transform infrared spectroscopy and chemometrics. J Agric Food Chem 44(1):170-174.
[7]
Calvo B, Santaf G (2016) scmamp: statistical comparison of multiple algorithms in multiple problems. R J 8(1):248-256.
[8]
Castro N, Azevedo P (2010) Multiresolution Motif Discovery in Time Series, pp 665-676.
[9]
Chen JS, Moon YS, Yeung HW (2005) Palmprint authentication using time series. In: Kanade T, Jain A, Ratha NK (eds) Audio- and video-based biometric person authentication. Springer, Berlin, pp 376-385.
[10]
Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The ucr time series classification archive. www.cs.ucr.edu/~eamonn/time_series_data/.
[11]
Costa da Silva J, Klusch M (2007) Privacy-preserving discovery of frequent patterns in time series. In: Perner P (ed) Advances in data mining. Theoretical aspects and applications. Springer, Berlin, pp 318-328.
[12]
Dem?ar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1-30.
[13]
Garcia S, Herrera F (2008) An extension on "statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons. J Mach Learn Res 9:2677-2694.
[14]
Glatthorn JF, Gouge S, Nussbaumer S, Stauffacher S, Impellizzeri FM, Maffiuletti NA (2011) Validity and reliability of optojump photoelectric cells for estimating vertical jump height. J Strength Cond Res 25(2):556-560.
[15]
Gordon D, Hendler D, Rokach L (2012) Fast randomized model generation for shapelet-based time series classification. CoRR arXiv:1209.5038.
[16]
Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD '14, pp 392-401,
[17]
Ifrim G, Wiuf C (2011) Bounded coordinate-descent for biological sequence classification in high dimensional predictor space. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD '11, pp 708-716,
[18]
Ismail Fawaz H, Forestier G, Weber J, Idoumghar L, Muller PA (2019) Deep learning for time series classification: a review. Data Mining Knowl Discov.
[19]
Kasten EP, McKinley PK, Gage SH (2007) Automated ensemble extraction and analysis of acoustic data streams. In: 27th International conference on distributed computing systems workshops (ICDCSW' 07), pp 66-66,
[20]
Kate RJ (2016) Using dynamic time warping distances as features for improved time series classification. Data Mining Knowl Discov 30(2):283-312.
[21]
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263-286.
[22]
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, ACM, New York, NY, USA, DMKD '03, pp 2-11,
[23]
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Mining Knowl Discov 15(2):107-144.
[24]
Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287-315.
[25]
Lines J, Bagnall A (2015) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565-592.
[26]
Lines J, Davis LM, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD '12, pp 289-297,
[27]
Lines J, Taylor S, Bagnall A (2016) Hive-cote: The hierarchical vote collective of transformation-based ensembles for time series classification. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 1041-1046,
[28]
Markovic G, Dizdar D, Jukic I, Cardinale M (2004) Reliability and factorial validity of squat and countermovement jump tests. J Strength Cond Res 18(3):551-555.
[29]
Nguyen TL, Gsponer S, Ifrim G (2017) Time series classification by sequence learning in all-subsequence space. In: 2017 IEEE 33rd international conference on data engineering (ICDE), pp 947-958,
[30]
Nuzzo JL, McBride JM, Cormie P, McCaulley GO (2008) Relationship between countermovement jump performance and multijoint isometric and dynamic tests of strength. J Strength Cond Res 22(3):699- 707.
[31]
O'Reilly M, Caulfield B, Ward T, Johnston W, Doherty C (2018) Wearable inertial sensor systems for lower limb exercise detection and evaluation: a systematic review. Sports Medicine pp 1-26.
[32]
O'Reilly MA, Whelan DF, Ward TE, Delahunt E, Caulfield BM (2017) Classification of deadlift biomechanics with wearable inertial measurement units. J Biomech 58:155-161.
[33]
Picerno P, Camomilla V, Capranica L (2011) Countermovement jump performance assessment using a wearable 3d inertial measurement unit. J Sports Sci 29(2):139-146. 21120742.
[34]
Rakthanmanon T, Keogh E (2013) Fast shapelets: A scalable algorithm for discovering time series shapelets. In: Proceedings of the thirteenth SIAM conference on data mining (SDM), SIAM, pp 668-676.
[35]
Schäfer P (2015) The boss is concerned with time series classification in the presence of noise. Data Min Knowl Discov 29(6):1505-1530.
[36]
Schäfer P (2016) Scalable time series classification. Data Min Knowl Discov 30(5):1273-1298.
[37]
Schäfer P, Högqvist M (2012) Sfa: A symbolic fourier approximation and index for similarity search in high dimensional datasets. In: Proceedings of the 15th international conference on extending database technology, ACM, New York, NY, USA, EDBT '12, pp 516-527,
[38]
Schäfer P, Leser U (2017) Fast and accurate time series classification with weasel. In: Proceedings of the 2017 ACM on conference on information and knowledge management, ACM, New York, NY, USA, CIKM '17, pp 637-646,
[39]
Schäfer P, Leser U (2017) Multivariate time series classification with WEASEL+MUSE. CoRR arXiv:1711.11343.
[40]
Senin P, Malinchik S (2013) Sax-vsm: Interpretable time series classification using sax and vector space model. In: 2013 IEEE 13th international conference on data mining (ICDM), pp 1175-1180,
[41]
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275- 309.
[42]
Wang Z, Yan W, Oates T (2017) Time series classification from scratch with deep neural networks: a strong baseline. In: 2017 international joint conference on neural networks (IJCNN), pp 1578-1585,
[43]
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 947-956.
[44]
Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1):149-182.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery  Volume 33, Issue 4
Jul 2019
418 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 July 2019

Author Tags

  1. Interpretable classifier
  2. Linear models
  3. Multi-resolution multi-domain symbolic representations
  4. SAX
  5. SEQL
  6. SFA
  7. Time series classification

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Sequential Data Classification under Dynamic EmissionPattern Recognition and Image Analysis10.1134/S105466182401004834:1(187-198)Online publication date: 1-Mar-2024
  • (2024)Time series classification with their image representationNeurocomputing10.1016/j.neucom.2023.127214573:COnline publication date: 16-May-2024
  • (2024)Exploring the diverse world of SAX-based methodologiesData Mining and Knowledge Discovery10.1007/s10618-024-01075-239:1Online publication date: 28-Nov-2024
  • (2024)Robust explainer recommendation for time series classificationData Mining and Knowledge Discovery10.1007/s10618-024-01045-838:6(3372-3413)Online publication date: 1-Nov-2024
  • (2024)Bake off redux: a review and experimental evaluation of recent time series classification algorithmsData Mining and Knowledge Discovery10.1007/s10618-024-01022-138:4(1958-2031)Online publication date: 1-Jul-2024
  • (2024)Anomaly detection in sleep: detecting mouth breathing in childrenData Mining and Knowledge Discovery10.1007/s10618-023-00985-x38:3(976-1005)Online publication date: 1-May-2024
  • (2024)Fast, accurate and explainable time series classification through randomizationData Mining and Knowledge Discovery10.1007/s10618-023-00978-w38:2(748-811)Online publication date: 1-Mar-2024
  • (2024)Z-Time: efficient and effective interpretable multivariate time series classificationData Mining and Knowledge Discovery10.1007/s10618-023-00969-x38:1(206-236)Online publication date: 1-Jan-2024
  • (2024)Sustainable and Explainable Neural Network for Real-Time Time Series ClassificationPattern Recognition10.1007/978-3-031-78169-8_26(391-405)Online publication date: 1-Dec-2024
  • (2023)Understanding Any Time Series Classifier with a Subsequence-based ExplainerACM Transactions on Knowledge Discovery from Data10.1145/362448018:2(1-34)Online publication date: 13-Nov-2023
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media