[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Exploring the diverse world of SAX-based methodologies

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Symbolic Aggregate Approximation (SAX) is a widely used method for time series data analysis, known for its ability to transform continuous data to discrete symbols. While SAX has demonstrated its effectiveness in various applications, it is not without limitations. SAX method has certain weaknesses that have motivated researchers to further investigate and propose new modifications to address them. In this article, we present a comprehensive review of the published variations of SAX and categorize them into groups based on the weaknesses they aim to overcome. This taxonomy provides a valuable resource for researchers interested in SAX, as it consolidates a wide range of modifications in one work. Through an extensive literature review, we have gathered a diverse collection of all published variations of the SAX method. Each variation is accompanied by a concise description, facilitating readers’ understanding of the key features and benefits of each approach. By systematically presenting these variations, this paper encourages the exploration of the SAX method’s landscape, enabling researchers to identify the most suitable method for their specific needs, revealing research gaps and alongside promoting further investigation in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Aghabozorgi S, Seyed Shirkhorshidi A, Ying Wah T (2015) Time-series clustering – A decade review. Inf Syst 53:16–38. https://doi.org/10.1016/j.is.2015.04.007

    Article  MATH  Google Scholar 

  • Alaee S, Mercer R, Kamgar K, Keogh E (2021) Time series motifs discovery under DTW allows more robust discovery of conserved structure. Data Min Knowl Disc 35:863–910. https://doi.org/10.1007/s10618-021-00740-0

    Article  MathSciNet  MATH  Google Scholar 

  • Alaee S, Kamgar K, Keogh E (2020) Matrix profile XXII: Exact discovery of time series motifs under DTW. In: 2020 IEEE International Conference on Data Mining (ICDM). pp 900–905

  • Anacleto M, Vinga S, Carvalho AM (2020) MSAX: Multivariate symbolic aggregate approximation for time series classification. In: Cazzaniga P, Besozzi D, Merelli I, Manzoni L (eds) Computational intelligence methods for bioinformatics and biostatistics. Springer International Publishing, Cham, pp 90–97

    Chapter  MATH  Google Scholar 

  • Ansari S, Du H, Naghdy F, Stirling D (2021) Application of fully adaptive symbolic representation to driver mental fatigue detection based on body posture. In: 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC). pp 1313–1318

  • Appel G (2005) Technical analysis: power tools for active investors. Financial Times/Prentice Hall, Upper Saddle River, NJ

  • Aremu OO, Hyland-Wood D, McAree PR (2019) A relative entropy weibull-SAX framework for health indices construction and health stage division in degradation modeling of multivariate time series asset data. Adv Eng Inform 40:121–134. https://doi.org/10.1016/j.aei.2019.03.003

    Article  Google Scholar 

  • Bai X, Xiong Y, Zhu Y, Zhu H (2013) Time series representation: A random shifting perspective. In: Wang J, Xiong H, Ishikawa Y et al (eds) Web-age information management. Springer, Berlin, Heidelberg, pp 37–50

    Chapter  MATH  Google Scholar 

  • Bai B, Li G, Wang S et al (2021) Time series classification based on multi-feature dictionary representation and ensemble learning. Expert Syst Appl 169:114162. https://doi.org/10.1016/j.eswa.2020.114162

    Article  MATH  Google Scholar 

  • Bao Y, Chen W (2018) Automated concept extraction in internet-of-things. pp 1770–1776

  • Barnaghi PM, Abu Bakar A, Othman ZA (2012) Enhanced symbolic aggregate approximation method for financial time series data representation. In: 2012 6th International Conference on New Trends in Information Science, Service Science and Data Mining (ISSDM2012). pp 790–795

  • Baruník J, Křehlík T (2018) Measuring the frequency dynamics of financial connectedness and systemic risk. J Financ Economet 16:271–296. https://doi.org/10.1093/jjfinec/nby001

    Article  MATH  Google Scholar 

  • Beasley TM, Erickson S, Allison DB (2009) Rank-based inverse normal transformations are increasingly used, but are they merited? Behav Genet 39:580–595. https://doi.org/10.1007/s10519-009-9281-0

    Article  Google Scholar 

  • Bettaiah V, Ranganath HS (2014) An analysis of time series representation methods: data mining applications perspective. In: Proceedings of the 2014 ACM Southeast Regional Conference. Association for Computing Machinery, New York, NY, USA, pp 1–6

  • Bjorkman M, Holmstrom K (1999) Global optimization using the DIRECT algorithm in Matlab. AMO 1:17–37

    MATH  Google Scholar 

  • Bondu A, Boullé M, Cornuéjols A (2016) Symbolic Representation of time series: A hierarchical coclustering formalization. In: Douzal-Chouakria A, Vilar JA, Marteau P-F (eds) Advanced analysis and learning on temporal data. Springer International Publishing, Cham, pp 3–16

    MATH  Google Scholar 

  • Bondu A, Boullé M, Grossin B (2013) SAXO: An optimized data-driven symbolic representation of time series. In: The 2013 International Joint Conference on Neural Networks (IJCNN). pp 1–9

  • Bountrogiannis K, Tzagkarakis G, Tsakalides P (2022) Distribution agnostic symbolic representations for time series dimensionality reduction and online anomaly detection. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2022.3174630

    Article  MATH  Google Scholar 

  • Bountrogiannis K, Tzagkarakis G, Tsakalides P (2021a) Anomaly detection for symbolic time series representations of reduced dimensionality. In: 2020 28th European Signal Processing Conference (EUSIPCO). pp 2398–2402

  • Bountrogiannis K, Tzagkarakis G, Tsakalides P (2021b) Data-driven Kernel-based probabilistic SAX for time series dimensionality reduction. In: 2020 28th European Signal Processing Conference (EUSIPCO). pp 2343–2347

  • Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge, UK, New York

    Book  MATH  Google Scholar 

  • Butler M, Kazakov D (2015) SAX discretization does not guarantee equiprobable symbols. IEEE Trans Knowl Data Eng 27:1162–1166. https://doi.org/10.1109/TKDE.2014.2382882

    Article  Google Scholar 

  • Cartwright E, Crane M, Ruskin HJ (2022) Side-Length-independent motif (SLIM): motif discovery and volatility analysis in time series—SAX, MDL and the matrix profile. Forecasting 4:219–237. https://doi.org/10.3390/forecast4010013

    Article  MATH  Google Scholar 

  • Chan HK, Xu S, Qi X (2019) A comparison of time series methods for forecasting container throughput. Int J Log Res Appl 22:294–303. https://doi.org/10.1080/13675567.2018.1525342

    Article  MATH  Google Scholar 

  • Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24:603–619. https://doi.org/10.1109/34.1000236

    Article  MATH  Google Scholar 

  • Djebour L, Akbarinia R, Masseglia F (2023) Variable-size segmentation for time series representation. In: Hameurlain A, Tjoa AM (eds) Transactions on large-scale data- and knowledge-centered systems LIII. Springer, Berlin, Heidelberg, pp 34–65

    Chapter  MATH  Google Scholar 

  • Djebour L, Akbarinia R, Masseglia F (2022) Parallel techniques for variable size segmentation of time series datasets. In: Advances in databases and information systems: 26th European Conference, ADBIS 2022, Turin, Italy, September 5–8, 2022, Proceedings. Springer-Verlag, Berlin, Heidelberg, pp 148–162

  • Drenick RF (1960) The failure law of complex equipment. J Soc Ind Appl Math 8:680–690. https://doi.org/10.1137/0108051

    Article  MathSciNet  MATH  Google Scholar 

  • Elsworth S, Güttel S (2020) ABBA: adaptive Brownian bridge-based symbolic aggregation of time series. Data Min Knowl Disc 34:1175–1200. https://doi.org/10.1007/s10618-020-00689-6

    Article  MathSciNet  MATH  Google Scholar 

  • Esling P, Agon C (2012) Time-series data mining. ACM Computing Surveys 45:12:1-12:34. https://doi.org/10.1145/2379776.2379788

    Article  MATH  Google Scholar 

  • Esmael B, Arnaout A, Fruhwirth R, Thonhauser G (2012) Multivariate time series classification by combining trend-based and value-based approximations.

  • Farebrother RW (2013) L1-Norm and L∞-Norm estimation: An introduction to the least absolute residuals, the minimax absolute residual and related fitting procedures. Springer, Berlin Heidelberg, Berlin, Heidelberg

    Book  MATH  Google Scholar 

  • Ferreira AA, Barbosa IMBR, Aquino RRB et al (2019) Adaptive piecewise and symbolic aggregate approximation as an improved representation method for heat waves detection. Adv Intell Syst Comput 858:658–671. https://doi.org/10.1007/978-3-030-01174-1_51

    Article  MATH  Google Scholar 

  • Fu T, Chung F, Luk R, Ng C (2008) Representing financial time series based on data point importance. Eng Appl Artif Intell 21:277–300. https://doi.org/10.1016/j.engappai.2007.04.009

    Article  MATH  Google Scholar 

  • Fuad MMM (2012a) Genetic algorithms-based symbolic aggregate approximation. In: Cuzzocrea A, Dayal U (eds) Data warehousing and knowledge discovery. Springer, Berlin, Heidelberg, pp 105–116

    Chapter  MATH  Google Scholar 

  • Fuad MMM (2012b) Differential evolution versus genetic algorithms: towards symbolic aggregate approximation of non-normalized time series. In: Proceedings of the 16th international database engineering & applications sysmposium. association for computing machinery, New York, NY, USA, pp 205–210

  • Fuad MMM (2020) Extreme-sax: Extreme points based symbolic representation for time series classification. Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12393 LNCS:122–130. https://doi.org/10.1007/978-3-030-59065-9_10

  • Gallimore MS, Bingham CM, Riley MJW (2017) Self-organising symbolic aggregate approximation for real-time fault detection and diagnosis in transient dynamic systems. pp 43–48

  • Ganz F, Barnaghi P, Carrez F (2013) Information abstraction for heterogeneous real world internet data. IEEE Sens J 13:3793–3805. https://doi.org/10.1109/JSEN.2013.2271562

    Article  MATH  Google Scholar 

  • Geem ZW, Kim JH, Loganathan GV (2001) A new heuristic optimization algorithm: harmony search. Simulation 76:60–68. https://doi.org/10.1177/003754970107600201

    Article  MATH  Google Scholar 

  • Goh DH, Ang RP (2007) An introduction to association rule mining: An application in counseling and help-seeking behavior of adolescents. Behav Res Methods 39:259–266. https://doi.org/10.3758/BF03193156

    Article  MATH  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York, New York, NY

    Book  MATH  Google Scholar 

  • Hatwar KS, Badhiye SS (2015) Alphabetic time series representation using trend based approach. In: 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS). pp 1–4

  • He Z, Zhang C, Ma X, Liu G (2021) Hexadecimal aggregate approximation representation and classification of time series data. Algorithms 14:353. https://doi.org/10.3390/a14120353

    Article  MATH  Google Scholar 

  • Herrera M, Ferreira AA, Coley DA, de Aquino RRB (2016) SAX-quantile based multiresolution approach for finding heatwave events in summer temperature time series. AIC 29:725–732. https://doi.org/10.3233/AIC-160716

    Article  MathSciNet  Google Scholar 

  • Hugueney B (2006) Adaptive segmentation-based symbolic representations of time series for better modeling and lower bounding distance measures. In: Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases. Springer-Verlag, Berlin, Heidelberg, pp 545–552

  • Ifrim G, Wiuf C (2011) Bounded coordinate-descent for biological sequence classification in high dimensional predictor space. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, pp 708–716

  • Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, Upper Saddle River, N.J.

    MATH  Google Scholar 

  • Kegel L, Hartmann C, Thiele M, Lehner W (2021) Season- and trend-aware symbolic approximation for accurate and efficient time series matching. Datenbank Spektrum 21:225–236. https://doi.org/10.1007/s13222-021-00389-5

    Article  MATH  Google Scholar 

  • Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3:263–286. https://doi.org/10.1007/PL00011669

    Article  MATH  Google Scholar 

  • Keogh E, Wei L, Xi X, et al (2006) Intelligent icons: integrating lite-weight data mining and visualization into GUI operating systems. In: Sixth International Conference on Data Mining (ICDM’06). pp 912–916

  • Kloska M, Rozinajova V (2020) Distribution-Wise Symbolic Aggregate ApproXimation (dwSAX). Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12489 LNCS:304–315. https://doi.org/10.1007/978-3-030-62362-3_27

  • Kloska M, Rozinajova V (2021) Towards symbolic time series representation improved by kernel density estimators. In: Hameurlain A, Tjoa AM (eds) Transactions on large-scale data- and knowledge-centered systems L. Springer, Berlin, Heidelberg, pp 25–45

    Chapter  Google Scholar 

  • Klus L, Lohan ES, Granell C, Nurmi J (2020) Lossy compression methods for performance-restricted wearable devices

  • Lavielle M (2005) Using penalized contrasts for the change-point problem. Signal Process 85:1501–1510. https://doi.org/10.1016/j.sigpro.2005.01.012

    Article  MATH  Google Scholar 

  • Le X-MT, Tran TM, Nguyen HT (2020) An improvement of SAX representation for time series by using complexity invariance. IDA 24:625–641. https://doi.org/10.3233/IDA-194574

    Article  MATH  Google Scholar 

  • Lee I, Park SH, Baek J-G (2020) Random-forest-based real-time contrasts control chart using adaptive breakpoints with symbolic aggregate approximation. Expert Syst Appl 158:113407. https://doi.org/10.1016/j.eswa.2020.113407

    Article  MATH  Google Scholar 

  • Lee H, Singh R (2012) Symbolic representation and clustering of bio-medical time-series data using non-parametric segmentation and cluster ensemble. In: 2012 25th IEEE International Symposium on Computer-Based Medical Systems (CBMS). pp 1–6

  • Leitão J, Neves RF, Horta N (2016) Combining rules between PIPs and SAX to identify patterns in financial markets. Expert Syst Appl 65:242–254. https://doi.org/10.1016/j.eswa.2016.08.032

    Article  Google Scholar 

  • Li G, Zhang L, Yang L (2012) TSX: a novel symbolic representation for financial time series. In: Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence. Springer-Verlag, Berlin, Heidelberg, pp 262–273

  • Li Y, Shen D (2022) A new symbolic representation method for time series. Inf Sci 609:276–303. https://doi.org/10.1016/j.ins.2022.07.047

    Article  MATH  Google Scholar 

  • Liang S, Zhang Y, Ma J (2020) Enhancing linear time complexity time series classification with hybrid bag-of-patterns. In: Nah Y, Cui B, Lee S-W et al (eds) Database systems for advanced applications. Springer International Publishing, Cham, pp 717–735

    Chapter  MATH  Google Scholar 

  • Lima WS, Bragança HLS, Souto EJP (2021) NOHAR - NOvelty discrete data stream for human activity recognition based on smartphones with inertial sensors. Expert Syst Appl 166:114093. https://doi.org/10.1016/j.eswa.2020.114093

    Article  MATH  Google Scholar 

  • Lin J, Keogh E, Lonardi S (2005) Visualizing and discovering non-trivial patterns in large time series databases. Inf vis 4:61–82

    Article  MATH  Google Scholar 

  • Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Disc 15:107–144. https://doi.org/10.1007/s10618-007-0064-z

    Article  MathSciNet  MATH  Google Scholar 

  • Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39:287–315. https://doi.org/10.1007/s10844-012-0196-5

    Article  MATH  Google Scholar 

  • Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery - DMKD ’03. ACM Press, San Diego, California, p 2

  • Liu Y, Garg S, Nie J et al (2021) Deep anomaly detection for time-series data in industrial IoT: A communication-efficient on-device federated learning approach. IEEE Internet Things J 8:6348–6358. https://doi.org/10.1109/JIOT.2020.3011726

    Article  MATH  Google Scholar 

  • Lkhagva B, Suzuki Y, Kawagoe K (2006) New time series data representation ESAX for financial applications. In: 22nd International Conference on Data Engineering Workshops (ICDEW’06). pp x115–x115

  • Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28:129–137. https://doi.org/10.1109/TIT.1982.1056489

    Article  MathSciNet  MATH  Google Scholar 

  • Malinowski S, Guyet T, Quiniou R, Tavenard R (2013) 1d-SAX: A novel symbolic representation for time series. In: Tucker A, Höppner F, Siebes A, Swift S (eds) Advances in intelligent data analysis XII. Springer, Berlin, Heidelberg, pp 273–284

    Chapter  Google Scholar 

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Márquez-Grajales A, Acosta-Mesa H-G, Mezura-Montes E, Graff M (2020) A multi-breakpoints approach for symbolic discretization of time series. Knowl Inf Syst 62:2795–2834. https://doi.org/10.1007/s10115-020-01437-4

    Article  MATH  Google Scholar 

  • Márquez-Grajales A, Acosta-Mesa H-G, Mezura-Montes E (2017) An adaptive symbolic discretization scheme for the classification of temporal datasets using NSGA-II. In: 2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC). pp 1–8

  • Mohammad Y, Nishida T (2014) Robust learning from demonstrations using multidimensional SAX. In: 2014 14th International Conference on Control, Automation and Systems (ICCAS 2014). pp 64–71

  • Mohammed Ahmed A, Abu Bakar A, Razak Hamdan A (2014) A harmony search algorithm with multi-pitch adjustment rate for symbolic time series data representation. IJMECS 6:58–70. https://doi.org/10.5815/ijmecs.2014.06.08

    Article  MATH  Google Scholar 

  • Mohammed Ahmed A, Bakar AA, Hamdan AR (2011) Harmony Search algorithm for optimal word size in symbolic time series representation. In: 2011 3rd Conference on Data Mining and Optimization (DMO). pp 57–62

  • Montgomery DC, Peck EA, Vining GG (2020) Introduction to linear regression analysis, 5th edn. Wiley, Hoboken, New Jersey

    MATH  Google Scholar 

  • Mueen A (2014) Time series motif discovery: dimensions and applications: Time series motif discovery. Wires Data Mining Knowl Discov 4:152–159. https://doi.org/10.1002/widm.1119

    Article  MATH  Google Scholar 

  • Nguyen TL, Gsponer S, Ilie I et al (2019) Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations. Data Min Knowl Disc 33:1183–1222. https://doi.org/10.1007/s10618-019-00633-3

    Article  MathSciNet  MATH  Google Scholar 

  • Nguyen TL, Gsponer S, Ifrim G (2017) Time series classification by sequence learning in all-subsequence space. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE). pp 947–958

  • Nickerson PV, Baharloo R, Wanigatunga AA et al (2018) Transition icons for time-series visualization and exploratory analysis. IEEE J Biomed Health Inform 22:623–630. https://doi.org/10.1109/JBHI.2017.2704608

    Article  Google Scholar 

  • Ong BT, Sugiura K, Zettsu K (2016) Dynamically pre-trained deep recurrent neural networks using environmental monitoring data for predicting PM2.5. Neural Comput Applic 27:1553–1566. https://doi.org/10.1007/s00521-015-1955-3

    Article  Google Scholar 

  • Oppenheim AV, Schafer RW (1975) Digital signal processing. Prentice-Hall, Englewood Cliffs, N.J.

    MATH  Google Scholar 

  • Ordonez P, Armstrong T, Oates T, Fackler J (2011) Using modified multivariate bag-of-words models to classify physiological data. In: 2011 IEEE 11th International Conference on Data Mining Workshops. pp 534–539

  • Page MJ, McKenzie JE, Bossuyt PM, et al (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ n71. https://doi.org/10.1136/bmj.n71

  • Pappa L, Karvelis P, Georgoulas G, Stylios C (2020) Multichannel symbolic aggregate approximation intelligent icons: Application for activity recognition. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI). pp 505–512

  • Pappa L, Karvelis P, Georgoulas G, Stylios C (2021) Slopewise Aggregate Approximation SAX: keeping the trend of a time series. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI). pp 01–08

  • Park S-H, Chun S-J, Lee J-H, Song J-W (2010) Representation and clustering of time series by means of segmentation based on PIPs detection. In: 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE). pp 17–21

  • Park H, Jung J-Y (2020) SAX-ARM: Deviant event pattern discovery from multivariate time series using symbolic aggregate approximation and association rule mining. Expert Syst Appl 141:112950. https://doi.org/10.1016/j.eswa.2019.112950

    Article  MATH  Google Scholar 

  • Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065–1076

    Article  MathSciNet  MATH  Google Scholar 

  • Pavlopoulou N, Curry E (2022) IoTSAX: A Dynamic abstractive entity summarization approach with approximation and embedding-based reasoning rules in publish/subscribe systems. IEEE Internet Things J 9:1830–1847. https://doi.org/10.1109/JIOT.2021.3089931

    Article  MATH  Google Scholar 

  • Penfold RB, Zhang F (2013) Use of interrupted time series analysis in evaluating health care quality improvements. Acad Pediatr 13:S38–S44. https://doi.org/10.1016/j.acap.2013.08.002

    Article  MATH  Google Scholar 

  • Percival DB, Walden AT (2000) Wavelet methods for time seriesanalysis. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Pham ND, Le QL, Dang TK (2010) Two novel adaptive symbolic representations for similarity search in time series databases. In: 2010 12th International Asia-Pacific Web Conference. pp 181–187

  • Ratanamahatana C, Keogh E, Bagnall AJ, Lonardi S (2005) A novel bit level time series representation with implication of similarity search and clustering. Springer, pp 771–777

  • Rezvani R, Barnaghi P, Enshaeifar S (2021) A new pattern representation method for time-series data. IEEE Trans Knowl Data Eng 33:2818–2832. https://doi.org/10.1109/TKDE.2019.2961097

    Article  MATH  Google Scholar 

  • Robert CP (2007) The Bayesian choice: from decision-theoretic foundations to computational implementation, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann Math Stat 27:832–837

    Article  MathSciNet  MATH  Google Scholar 

  • Ruan H, Hu X, Xiao J, Zhang G (2020) TrSAX—An improved time series symbolic representation for classification. ISA Trans 100:387–395. https://doi.org/10.1016/j.isatra.2019.11.018

    Article  MATH  Google Scholar 

  • Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18:613–620. https://doi.org/10.1145/361219.361220

    Article  MATH  Google Scholar 

  • Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319. https://doi.org/10.1162/089976698300017467

    Article  MATH  Google Scholar 

  • Senin P, Malinchik S (2013) SAX-VSM: Interpretable time series classification using SAX and vector space model. In: 2013 IEEE 13th International Conference on Data Mining. pp 1175–1180

  • Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

    Article  MathSciNet  MATH  Google Scholar 

  • Shieh J, Keogh E (2008) iSAX: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, pp 623–631

  • Song W, Wang Z, Zhang F et al (2017) Empirical study of symbolic aggregate approximation for time series classification. Intelligent Data Anal 21:135–150. https://doi.org/10.3233/IDA-150351

    Article  MATH  Google Scholar 

  • Song K, Ryu M, Lee K (2020) Transitional SAX representation for knowledge discovery for time series. Appl Sci 10:6980. https://doi.org/10.3390/app10196980

    Article  MATH  Google Scholar 

  • Sun Y, Li J, Liu J et al (2014) An improvement of symbolic aggregate approximation distance measure for time series. Neurocomputing 138:189–198. https://doi.org/10.1016/j.neucom.2014.01.045

    Article  MATH  Google Scholar 

  • Sun C, Stirling D, Ritz C, Sammut C (2012) Variance-wise segmentation for a temporal-adaptive SAX. pp 71–77

  • Tabassum N, Menon S, Jastrzębska A (2022) Time-series classification with SAFE: Simple and fast segmented word embedding-based neural time series classifier. Inf Process Manage 59:103044. https://doi.org/10.1016/j.ipm.2022.103044

    Article  MATH  Google Scholar 

  • Taktak M, Triki S, Kamoun A (2018) SAX-based representation with longest common subsequence dissimilarity measure for time series data classification. pp 821–828

  • Tamura K, Ichimura T (2017) MHSAX-based time series classification using local sequence alignment technique. pp 286–291

  • Tamura K, Ichimura T (2018) Clustering of time series using hybrid symbolic aggregate approximation. pp 1–8

  • Tamura K, Sakai T, Ichimura T (2016) Time series classification using MACD-Histogram-based SAX and its performance evaluation. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC). pp 002419–002424

  • van Eck NJ, Waltman L (2010) Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84:523–538. https://doi.org/10.1007/s11192-009-0146-3

    Article  MATH  Google Scholar 

  • Vlassis N, Likas A (2002) A greedy EM algorithm for Gaussian mixture learning. Neural Process Lett 15:77–87. https://doi.org/10.1023/A:1013844811137

    Article  MATH  Google Scholar 

  • Wan D, Xiao Y, Zhang P, et al (2014) Hydrological time series anomaly mining based on symbolization and distance measure. In: 2014 IEEE International Congress on Big Data. pp 339–346

  • Wang X, Mueen A, Ding H et al (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Disc 26:275–309. https://doi.org/10.1007/s10618-012-0250-5

    Article  MathSciNet  MATH  Google Scholar 

  • Wang H, Yang J, Hao S (2016) Two inverse normalizing transformation methods for the process capability analysis of non-normal process data. Comput Ind Eng 102:88–98. https://doi.org/10.1016/j.cie.2016.10.014

    Article  MATH  Google Scholar 

  • Wang Z, Wang L, Huang C et al (2021) Soil-moisture-sensor-based automated soil water content cycle classification with a hybrid symbolic aggregate approximation algorithm. IEEE Internet Things J 8:14003–14012. https://doi.org/10.1109/JIOT.2021.3068379

    Article  MATH  Google Scholar 

  • Wang Y, An Y (2016) Composite similarity measure algorithm. In: 2016 12th international conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). pp 1254–1258

  • Wang C, Viswanathan K, Choudur L, et al (2011) Statistical techniques for online anomaly detection in data centers. 12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops 385–392. https://doi.org/10.1109/INM.2011.5990537

  • Weishuhn M (2023) Using citations to explore academic literature | Inciteful.xyz. https://inciteful.xyz/. Accessed 26 Jul 2023

  • Wu I-C, Chen Y-A, Wang Z-X (2018) A CDF-based symbolic time-series data mining approach for electricity consumption analysis. In: Stephanidis C (ed) HCI International 2018 – Posters’ Extended Abstracts. Springer International Publishing, Cham, pp 515–521

    Chapter  Google Scholar 

  • Yahyaoui H, Al-Daihani R (2019) A novel trend based SAX reduction technique for time series. Expert Syst Appl 130:113–123. https://doi.org/10.1016/j.eswa.2019.04.026

    Article  MATH  Google Scholar 

  • Yan L, Wu X, Xiao J (2022) An improved time series symbolic representation based on multiple features and vector frequency difference. JCC 10:44–62. https://doi.org/10.4236/jcc.2022.106005

    Article  MATH  Google Scholar 

  • Yang S, Wang Y, Zhang J (2020) A similarity measure for time series based on symbolic aggregate approximation and trend feature. pp 6386–6390

  • Yin H, Yang S, Zhu X et al (2015) Symbolic representation based on trend features for biomedical data classification. Technol Health Care 23(Suppl 2):S501-510. https://doi.org/10.3233/THC-151002

    Article  MATH  Google Scholar 

  • Yu Y, Zhu Y, Wan D, et al (2019) A novel symbolic aggregate approximation for time series. In: Lee S, Ismail R, Choo H (eds) Proceedings of the 13th International Conference on Ubiquitous Information Management and Communication (IMCOM) 2019. Springer International Publishing, Cham, pp 805–822

  • Zalewski W, Silva F, Lee H, et al (2012a) Time series discretization based on the approximation of the local slope information.

  • Zalewski W, Silva F, Wu FC, et al (2012b) A symbolic representation method to preserve the characteristic slope of time series. In: Proceedings of the 21st Brazilian conference on Advances in Artificial Intelligence. Springer-Verlag, Berlin, Heidelberg, pp 132–141

  • Zan CT, Yamana H (2016) An improved symbolic aggregate approximation distance measure based on its statistical features. In: Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services. Association for Computing Machinery, New York, NY, USA, pp 72–80

  • Zan CT, Yamana H (2017) Dynamic SAX parameter estimation for time series. Int J Web Inform Syst 13:387–404. https://doi.org/10.1108/IJWIS-04-2017-0035

    Article  MATH  Google Scholar 

  • Zhan P, Hu Y, Zhang Q, et al (2018) Feature-based dividing symbolic time series representation for streaming data processing. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME). pp 817–823

  • Zhang C, Chen Y, Yin A, Wang X (2019a) Anomaly detection in ECG based on trend symbolic aggregate approximation. Math Biosci Eng 16:2154–2167

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang Y, Duan L, Duan M (2019b) A new feature extraction approach using improved symbolic aggregate approximation for machinery intelligent diagnosis. Measurement 133:468–478. https://doi.org/10.1016/j.measurement.2018.10.045

    Article  MATH  Google Scholar 

  • Zhang L, Pei T, Meng B et al (2020b) Two-phase multivariate time series clustering to classify urban rail transit stations. IEEE Access 8:167998–168007. https://doi.org/10.1109/ACCESS.2020.3022625

    Article  MATH  Google Scholar 

  • Zhang K, Li Y, Chai Y, Huang L (2018) Trend-based symbolic aggregate approximation for time series representation. In: 2018 Chinese Control And Decision Conference (CCDC). pp 2234–2240

  • Zhang H, Dong Y, Xu D (2020a) Entropy-based symbolic aggregate approximation representation method for time series. In: 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC). pp 905–909

  • Zhang Y, He G, Yu Y, Li G (2022) a data processing method of symbolic approximation. In: 2022 Prognostics and Health Management Conference (PHM-2022 London). pp 378–383

Download references

Acknowledgements

This research has been financed by the European Union : Next Generation EU through the Program Greece 2.0 National Recovery and Resilience Plan, under the call RESEARCH – CREATE – INNOVATE, project name “iCREW: Intelligent small craft simulator for advanced crew training using Virtual Reality techniques" (project code: TAEDK-06195)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lamprini Pappa.

Additional information

Handling Editor: Eamonn Keogh.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pappa, L., Karvelis, P. & Stylios, C. Exploring the diverse world of SAX-based methodologies. Data Min Knowl Disc 39, 4 (2025). https://doi.org/10.1007/s10618-024-01075-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10618-024-01075-2

Keywords

Navigation