[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
A Model-Driven Approach for Solving the Software Component Allocation Problem
Previous Article in Journal
A Sequential Graph Neural Network for Short Text Classification
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hexadecimal Aggregate Approximation Representation and Classification of Time Series Data

1
School of Computer Science, China University of Geosciences Wuhan, 388 Lumo Road, Wuhan 430074, China
2
Department of Computer Science, University of Idaho, 875 Perimeter Drive MS 1010, Moscow, ID 83844-1010, USA
*
Author to whom correspondence should be addressed.
Algorithms 2021, 14(12), 353; https://doi.org/10.3390/a14120353
Submission received: 29 September 2021 / Revised: 26 November 2021 / Accepted: 30 November 2021 / Published: 2 December 2021
Figure 1
<p>The relationship between PAA and SAX ([<a href="#B5-algorithms-14-00353" class="html-bibr">5</a>]).</p> ">
Figure 2
<p>Summary of time series 1 and 2. (<b>a</b>) Summary of time series 1 (<b>b</b>) Summary of time series 2.</p> ">
Figure 2 Cont.
<p>Summary of time series 1 and 2. (<b>a</b>) Summary of time series 1 (<b>b</b>) Summary of time series 2.</p> ">
Figure 3
<p>The transformations distance between <span class="html-italic">AB</span> and <span class="html-italic">CD</span>. (<b>a</b>) translation transformation; (<b>b</b>) rotation transformation; (<b>c</b>) scale transformation; (<b>d</b>) <span class="html-italic">AB</span> = <span class="html-italic">CD</span>.</p> ">
Figure 4
<p>Change the order of the transformation distance calculation between <span class="html-italic">AB</span> and <span class="html-italic">CD</span>. (<b>a</b>) rotation transformation; (<b>b</b>) translation transformation; (<b>c</b>) scale transformation; (<b>d</b>) <span class="html-italic">AB</span> = <span class="html-italic">CD</span>.</p> ">
Figure 5
<p>Transformable interval objects <span class="html-italic">AB</span> and <span class="html-italic">CD</span>.</p> ">
Figure 6
<p>TIO points on the TIO plane.</p> ">
Figure 7
<p>A 4 × 4 HAX grid plane.</p> ">
Figure 8
<p>Accuracy comparison plot for <a href="#algorithms-14-00353-t004" class="html-table">Table 4</a>.</p> ">
Figure 9
<p>Accuracy comparison between HAX and SAX (71 points in the upper triangle and 43 points in the lower triangle).</p> ">
Figure 10
<p>Accuracy comparison between PAX and SAX (111 points in the upper triangle and 3 points in the lower triangle).</p> ">
Figure 11
<p>Accuracy comparison among HAX, ED and SAX.</p> ">
Figure 12
<p>Accuracy comparison among PAX, ED and SAX.</p> ">
Figure 13
<p>Accuracy comparison between PAX and ED (48 points in the upper triangle and 63 points in the lower triangle).</p> ">
Figure 14
<p>Accuracy comparison between PAX and SAX-BD (50 points in the upper triangle and 54 points in the lower triangle).</p> ">
Versions Notes

Abstract

:
Time series data are widely found in finance, health, environmental, social, mobile and other fields. A large amount of time series data has been produced due to the general use of smartphones, various sensors, RFID and other internet devices. How a time series is represented is key to the efficient and effective storage and management of time series data, as well as being very important to time series classification. Two new time series representation methods, Hexadecimal Aggregate approXimation (HAX) and Point Aggregate approXimation (PAX), are proposed in this paper. The two methods represent each segment of a time series as a transformable interval object (TIO). Then, each TIO is mapped to a spatial point located on a two-dimensional plane. Finally, the HAX maps each point to a hexadecimal digit so that a time series is converted into a hex string. The experimental results show that HAX has higher classification accuracy than Symbolic Aggregate approXimation (SAX) but a lower one than some SAX variants (SAX-TD, SAX-BD). The HAX has the same space cost as SAX but is lower than these variants. The PAX has higher classification accuracy than HAX and is extremely close to the Euclidean distance (ED) measurement; however, the space cost of PAX is generally much lower than the space cost of ED. HAX and PAX are general representation methods that can also support geoscience time series clustering, indexing and query except for classification.
Keywords:
time series; SAX; PAA; HAX; PAX

1. Introduction

Time provides a basic cognitive variable for the continuity and sequential description of object movements and changes in the world [1,2,3,4,5,6]. Human society is facing many challenges, such as environmental pollution, population growth, urban expansion, the transmission of infectious diseases and various natural disaster monitoring and prevention issues, etc. These are all closely related to the concept of time and produce massive data containing information regarding time. Especially in recent years, a large amount of time series data has been produced due to the general use of smartphones, various sensors, RFID and other internet devices [2,3,4,5,6]. Time series data can help us understand history, master the present and predict the future, as well as improve our ability to gain insight, perception and prediction of the evolution of various existences and states in the real world.
Many applications in the fields of scientific research, industry and business produce large amounts of time-series data that need effective analysis, requiring rational representation and efficient similarity computing and search. These applications cover the domains of images, audio, finance, environmental monitoring and other scientific disciplines [7,8,9]. Many creative representation methods for time series data have been proposed for similarity computing, clustering [10], classification [11], indexing and query [3,8,9,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]. The taxonomy of time series representations includes four types [10]: data-adaptive, non-data adaptive, model-based and data dictated. The main representation methods for time series are shown in Table 1.
The most commonly used approximation representations are the PAA [42] and SAX [19,22,64] methods. In the last decades, most of time series data indexing methods [23,25,31,32,65,66] have been based on SAX [19,22]. The SAX method reduces an n-length time series to a w-length (w < n) string with an alphabet parameter named α. Though highly simple and straightforward, it is a major limitation because it may lead to some important features being lost [67]. To avoid this problem, the ESAX [68] and SAX-TD [67] methods improve the accuracy of SAX. The ESAX method adds two additional values, a maximum value and a minimum value, to each time series segment as the new feature. The mapping method between the new feature to SAX words is the same as the SAX method. Therefore, the length of the ESAX string is three times that in SAX. The SAX-TD method adds a trend distance for each segment; hence, the length of the symbol string is also two times the original string length. Therefore, both ESAX and SAX-TD methods require additional information to extend the original SAX string. The SAX-BD [62] method has a design integrating the SAX distance with a weighted boundary distance, resulting in it outperforming SAX-TD. Symbolic aggregate approximation methods have proven very effective in compacting the information content of time series. However, typical SAX-based techniques rely on a Gaussian assumption for the underlying data statistics, which often deteriorates their performance in practical scenarios [63].
To overcome this limitation, a novel method, the Hexadecimal Aggregate approXimation representation (HAX) of time series, is proposed in this paper. This method negates any assumption on the probability distribution of time series and represents each segment of a time series as a transformable interval object (TIO), using the transformation distance to measure the similarity between a pair of time series. Then, each TIO is mapped to a hexadecimal symbol by its location on a hexadecimal grid. Therefore, the HAX string is the same length as the SAX string with the same word size, the w parameter. We compare SAX, SAX-TD and SAX-BD methods with HAX methods. Our reason for choosing the SAX-TD and SAX-BD is that the outputs of the SAX-TD and SAX-BD still include the original SAX string, despite including some attachments; hence, there is comparability with the hex string of HAX. The experimental results show that HAX has higher accuracy than the SAX method. The remainder of the paper is organized as follows. Section 2 is the related work, Section 3 details the principle and method of HAX, Section 4 is the experimental evaluation and the last section is the conclusion.

2. Related Work

The most straightforward strategy for the representation of time series involves using a simple shape to reduce a segment, such as the piecewise linear representation (PLR) [17], the perceptually important point (PIP) [44] and the indexable piecewise linear approximation (IPLA) [51], among others. The simple shapes may be a point or a line. For example, the PLR and IPLA represent the original series as a set of straight lines fitting the important points of the series, and the PIP selects some important points of a segment to express the whole segment.
Another type involves choosing a simple value or symbol to express a segment of a time series. The PAA [42] method is the foundation of many time series representation methods, especially for SAX [19,22]. To reduce noise while preserving the trend of a time series, the PAA method takes the mean value over back-to-back points to decrease the number of points, as shown in Figure 1. At first, this method divides the original time series into w fixed-size segments and then computes the average values for each one. The data sequence assembled from the average values is the PAA approximation value of the original time series.
For instance, an n-length time series C is reduced to w symbols. At first, the time series is divided into w segments by the PAA method. The average value of each segment is shown as C ¯   =   C 1 ¯ ,   C 2 ¯ , ,   C w ¯ , in which the ith item of C ¯ is the average value of the ith segment and is computed by the Equation (1).
C i ¯ = w n j = ( n / w ) ( i + 1 ) + 1 ( n / w ) i C j  
here, C j is a one-time point value of the time series C.
The SAX method is one of the most typical time-series representation methods based on symbolic expression and has the same dividing strategy as the PAA method. The difference between the two is the mapping rule that the SAX uses. The SAX method divides a time series into a certain number of fixed-length subsequences (called segments) and uses symbols to represent the mean of each subsequence. There is an assumption that the time series data conforms to the Gaussian distribution, and the average value of each segment has an equal probability in the SAX. These are the base principle of the breakpoint strategy used in the SAX. This strategy makes the SAX method different from the PAA, and it can map each segment into its specified range determined by the Gaussian distribution [69]. Indeed, there is a lookup table in the SAX method with breakpoints that divide a Gaussian distribution in an arbitrary number of equiprobable regions. SAX uses this table to divide the series and map it into a SAX string [19,22], while the parameter w determines how many dimensions to reduce for the n-length time series. The smaller the parameter w is, the larger n/w is, resulting in a higher compression ratio. Finally, the SAX method can map each segment’s average value to an alphabetic symbol. The symbol string after those mappings can roughly indicate the time series.
The SAX method is well known and has been recognized by many researchers; however, the limits are also obvious. Therefore, many extended and updated methods of the SAX have been proposed, with some of the typical ones being the ESAX [67] method and the SAX-TD [66] method, among others. The ESAX method can express the more detailed features of a time series by adding a maximum value and a minimum to a new feature compared to the SAX. In addition, the SAX-TD method improves the ESAX via a trend distance strategy. The SAX-BD [4] method, proposed by us in our previous work, develops the SAX-TD using boundary distance.

3. Hexadecimal Aggregate Approximation Representation

The HAX method lets an n-length time series be reduced to w two-dimensional points in a hexadecimal plane (w < n, typically w << n) where each point locates at a hexadecimal cell and may be represented by the cell order (a hexadecimal digit). Therefore, the HAX method will reduce an n-length time series to w hexadecimal digits. Although storage space is cheap, we remain consistent in our thinking that space count is important. We intend to consider big data and aim to use SAX and HAX methods in an in-memory data index structure, so the space cost remains an important factor. Table 2 shows the major notations for the HAX method used in this paper.

3.1. Basic Principle of HAX

The main purpose of a time series representation method is to reduce the dimensionality of time series and then measure the similarity between two time series objects. The HAX method uses fitting segments to simplify a subseries. As shown in Figure 2, part (a) and part (b) respectively show two pieces of the time series 1 and 2 in the time ranges [t(i − 1), t(i)] and [t(i), t(i + 1)]. We take the diagonal of the smallest bounding rectangle of each subseries as its summary. For example, in Figure 2a, the summary of subseries (i) is the line segment AB or segment (i), and the summary of subseries (i + 1) is the line segment EF or segment (i + 1). The rule for selecting a suitable diagonal is the degree of fitting the diagonal to the time series segment. However, the calculation cost of this rule is too high. In the actual computing process, the maximum and minimum values of the subseries may be used for fast diagonal direction computing.
For the similarity of the subseries (i) of TimeSeries1 and TimeSeries2 in Figure 2, we may use the number of transformation steps of the segment AB and the segment CD to measure it. We call the number of transformation steps the transformation distance (TD), as shown in Figure 3. From CD to AB, this goes through vertical translation transformation (Figure 3a,b), rotation transformation (Figure 3b,c) and scale transformation (Figure 3c,d). The fewer transformation steps and the smaller the number of changes are, the higher the similarity is.
Since the AB angle is arbitrary, it is not suitable for fast computing. The AB and the CD are transformed at the same time to make them parallel to the V axis, and then other transformations are performed to make them coincide, as shown in Figure 4.
After this transformation, AB and CD can be rotated by angles α and β, respectively, so that the summary segment is always parallel to the axis V and the value of point B is always greater than the value of point A (shown in Figure 5). We call the transformed segments AB and CD transformable interval objects (TIO). The two TIOs can be represented by the following Formula:
T I O A B = ( V A , V B , α )  
T I O C D = ( V C , V D , β )  
Given that the distance between the center point of TIOAB and the center point of TIOCD is D0, and S0 is the scaling variable, the similarity distance between TIOAB and TIOCD is noted as:
D I S T ( T I O A B , T I O C D ) = a × | D 0 | + b × | α β | + c × | S 0 1 |  
where D0 is calculated by the Formula (5)
D 0 = ( ( V A + V B ) ( V C + V D ) ) 2  
and S0 can be calculated by the Formula (6).
S 0 = ( V B V A ) × c o s β / ( V C V C ) × c o s α  
The larger the DIST is, the smaller the similarity is, where a is the translation transformation factor, b is the angle transformation factor and c is the expansion transformation factor. Generally, the translation transformation factor and the rotation transformation angle factor have a greater effect, and the scaling transformation factor has a smaller effect. Therefore, Formula (4) can discard c × | S 0 1 | while calculating the approximate similarity distance, and VM can be the average value of the subseries. In Formula (5), D0 can be approximated by the difference between the average value of the two subseries. In this way, each TIO can be expressed as Formula (7):
TIO = ( V M , A )  
where V M is the median value of V in Formulas (2) and (3), and A is still the angle to the axis V in the range (−90, 90).
Let V be the vertical coordinate axis and let angle A be the horizontal coordinate axis. A two-dimensional plane called the TIO plane has been constructed, and each TIO is a point on the plane, as illustrated in Figure 6, which shows the TIO points corresponding to the four subseries in Figure 2. On the two-dimensional TIO plane, the points with greater similarity are closer to each other in that space. Generally, the TIO plane can be divided into many areas, such as 64, 128, 256 or 512, etc. To save storage space, we can divide the plane into at least 16 and up to 256 areas. This allows each area to be represented by an 8 bits number. The higher the area count is, the more accurate the distance measure between two sequences is; however, the more the count of areas, the more difficult the computation as well. To map an area into a single digit, the TIO plane is divided into sixteen areas in this paper. Each area is represented by a hexadecimal number from 0 to F, as shown in Figure 7. Each TIO point must fall into one of the areas. The hexadecimal code of this area is used to represent the point. In that way, a subseries can be converted into a TIO point and finally to a hexadecimal digit. Therefore, a time series can be represented as a hexadecimal string. For example, Figure 2a can be represented by a hexadecimal string as “04”, and Figure 2b as “35”. This is also how we obtain the full name of HAX: the Hexadecimal Aggregate approXimation representation.

3.2. HAX Distance Measures

The HAX transformation principle has been described in detail in the above section. Based on the principle, time-series objects can be easily converted into HAX strings, and the similarity between two time series objects can be measured by the distance between two corresponding HAX strings. This process can be described using the following Formula. Given a time series T which contains n values,
T = {v1, v2, …, vn}
T is split into w segments S by PAA (paaMapper),
S = paaMapper(T) = {s1,s2, …,sw}
where si is ith subseries. Via the TIO transformation, each subseries in S can be converted into a TIO point in a set of TIO points P,
P = tioMapper(S) = {p1,p2, …, pw}
where pj is a coordinate (sj, aj) and aj is a rotation angle of segment(j) corresponding to subseries(j). Each point corresponds to a HAX character and P can then be converted into a HAX string H through the transformation haxMapper,
H = haxMapper(P) = {h1,h2, …, hw}
Therefore, the approximate distance between two time series can be estimated by the distance between two corresponding HAX strings. Next, we discuss how to calculate the distance between two HAX characters.
Given two time series objects Tq and Tc, the similarity between these two can be estimated by many kinds of distances. The most commonly used is the Euclidean distance (ED), as follows:
ED ( T q , T c ) = i = 1 n ( v i q v i c ) 2 2  
However, the real time computation of ED is very inefficient for long time series. Hence the PAA splits a long time series into w short segments to reduce the dimension of the time series and adopt the segment distance (SD) to estimate the similarity as follows:
SD ( S q , S c ) = j = 1 w ( s j q s j c ) 2 2  
where
s j = w n j = ( n / w ) ( i + 1 ) + 1 ( n / w ) i v j  
Although the SD decreases the computation of ED, it also discards the trend of a time series. Therefore, the PAX distance (PD) in this paper is expressed by TIO points and is designed to take into account the impact of the trend. The Formula is as follows:
PD ( P q , P c ) = j = 1 w ( p j q p j c ) 2 2 = j = 1 w ( ( s j q s j c ) 2 + f ( a j q a j c ) 2 ) 2  
where f is a real number between (0, 1), used to adjust the weight between the V axis distance and the A axis (angle) distance so that the difference between PD and ED approaches 0 as much as possible. If there is no adjustment weight, f = 1. We call this method Point Aggregate approXimation representation (PAX).
The PAX enhances the accuracy of the similarity measurement of time series, but there are two factors in this method, and it is not convenient for character variables. Then, we use the haxMapper to map it to a hexadecimal character. The distance between the HAX (HD) strings Hq and Hc is used to measure the similarity distance between the two time series, denoted as
HD ( H q , H c ) = j = 1 w ( h a x M a p p e r ( p j q ) h a x M a p p e r ( p j c ) ) 2 2 = j = 1 w ( h j q h j c ) 2 2  
where haxMapper may have different mapping ways, using either sequential grid coding mapping or other mapping methods such as the Hilbert curve filling or Z-Ordering curve filling methods. This paper focuses on the basic sequential grid coding method, shown in Figure 7.

4. Experimental Evaluation

To verify the representation method proposed in this paper, we conducted an experimental evaluation for the HAX method and compared it with the SAX. The SAX method was selected because the SAX and the HAX methods are both symbol-based representation methods based on the PAA division and have the same string length for a time series object. In addition, the PAX method is a middle process result of the HAX and the length of its representation string is 16 times that of the HAX. Therefore, our experimental evaluation included the PAX and ED methods.
Since the analysis of time series data, which is based on the calculation of the similarity distance regardless of whether it is classification, clustering or query, we selected the simplest and most representative: the one nearest neighbor (1-NN) classification method [68] for the experiments of comparison [55,68,69]. All algorithms were implemented in Java, and the source code can be found at https://github.com/zhenwenhe/series.git, accessed on 29 September 2021. Next, we introduce the experimental data set and method parameter settings and then analyze the experimental results.

4.1. Experimental Data

This experiment used the latest time-series data set UCRArchive2018 [7]. The data set has been widely used in time series data analysis and mining algorithm experiments since 2002. After expansion in 2015 and 2018, UCRArchive2018 contains a total of 128 data sets. There are 14 data sets that are variable-length. Variable-length refers to the different lengths of sequences in the dataset and is not a very common time series. Since we did not consider the similarity measurement between variable-length time series data in the implementation of the algorithm, the experiment in this paper eliminated the 14 variable-length data sets. The data list used is shown in Table 3. The column ID is the order number of each data set in UCRArchive2018, which ranges from 1 to 144. The column Type shows the time series data type of each data set. The column Name is the name of each data set. The column Train is the number of series for the train set, and the column Test is the number of series for the test set. The column Class presents the class number in each data set in the UCRArchive2018. The column Length presents the point number of the correspondent time series in the data set. Each dataset has two parts, Train and Test, one for training the parameters and the other for the testing test. The datasets contain classes ranging from 2 to 60 and have the lengths of time series varying from 15 to 2844. The database was used in many recent papers [9,11]. We intend to cover time series in finance and economics in future works.

4.2. Experimental Parameter Setting

Since both the HAX and the SAX methods are based on PAA division, the parameter w represents the dimension size in the PAA. In addition, the alpha parameter represents the alphabet size in the SAX. In our experiment, the two methods used the same w parameter. For each data set, the range of w was [5,20]. For the SAX method and each w, the range of the corresponding parameter alpha was [3,16]. According to the existing references about SAX, the nearest neighbor classification accuracy results for SAX are always the best for the UCRArchive2018 when the range of w is [5,20] and the range of the corresponding parameter alpha is [3,16]. The score is computed for each w for HAX and each combination of w and alpha for the SAX method. Therefore, we used the average score to measure the accuracy. The classification algorithm 1-NN was used to classify each different parameter setting of each data set, and then a calculation of the classification accuracy scores was carried out. For example, for each data set in the experimental database, the HAX method would calculate 16 scores and then get an average value of the scores; while the SAX method would calculate 16 × 14 scores and then compute the average score. We used the average of this classification accuracy rate as a measured variable.

4.3. Experimental Results and Analysis

Four methods, the SAX, SAX_TD, SAX-BD and ED, were the baseline methods, and the classification accuracy of each representation method was calculated based on the 1-NN. Table 4 shows the experimental results. The column ID is the identifier of the data set in Table 3. The columns, ED, PAX, HAX, SAX, SAX-TD and SAX-BD, are the representation methods’ names. The values in each column of the methods are the classification accuracy values. Figure 8 shows the results in a plot. Our previous work [62] presented the comparison results among SAX, ESAX, SAX-TD and SAX-BD. Here we will focus on the comparison of HAX, SAX, PAX, SAX-BD and ED.
The results in Table 4 show that the classification accuracy of the PAX is significantly higher than those of the HAX and SAX methods, and the HAX has some advantages over the SAX classification accuracy. Figure 9 makes the comparison between the HAX and SAX methods more obvious. The X-axis value is the accuracy of the SAX, the Y-axis value is the accuracy of the HAX and the scattered points are mostly in the upper triangle (71 points in the upper triangle and 43 points in the lower triangle). This shows that the accuracy of the HAX is larger than the SAX. Figure 10 makes the comparison between the PAX and SAX methods more obvious. Almost all the scattered points in Figure 10 are in the upper triangle, which shows that the accuracy of the PAX is significantly larger than the SAX.
The ED is still widely used in equal length time series measurements. In our work, we selected the ED and SAX methods as baselines. Figure 11 shows the accuracy comparison among the HAX, the ED and the SAX. The results show that ED still has higher accuracy when compared with the SAX and HAX methods. Figure 12 shows the accuracy comparison among the PAX, the ED and the SAX. It shows that the accuracy rates of the PAX and ED are very close. Figure 13 makes the comparison between the PAX and ED methods more obvious. About half of the scattered points in Figure 13 are in the upper triangle. Figure 14 makes the comparison between the PAX and SAX-BD methods more obvious. These figures show that the accuracy of PAX is lower than the ED and SAX-BD methods but very close to them.
In terms of space cost, the HAX realizes the dimensionality reduction of high-dimensional time series by representing a time series as a set of hex strings, reducing the amount of information required for time series storage and making it more convenient to be used in various fields. For a time series with the same parameter w, the length of the hex string is equal to that of the SAX string. While the space cost of SAX-TD is five times that of the SAX, the space cost of SAX-BD is nine times that of the SAX. Although the PAX has higher accuracy than the HAX, it only implements the reduction of the time series to a set two-dimensional data point, and the space cost of PAX is sixteen times greater than that of the HAX. Therefore, they have the following relationship,
SC ( HAX ) = SC ( SAX ) = SC ( SAX TD ) / 5 = SC ( SAX BD ) / 9 = SC ( PAX ) / 16 = w n SC ( ED )
in which the SC is a space cost function, n is the length of a time series and w is the piece parameter.

5. Conclusions

In this paper, two new time series representation methods, the Hexadecimal Aggregation approXimate (HAX) and the Point Aggregation approXimate (PAX), are proposed. These two methods negate any assumption on the probability distribution of time series and initially represent each segment of a time series as a Transformable Interval Object (TIO). Then, each TIO is mapped to a spatial point located on a two-dimensional plane. The PAX represents each segment of a time series as a spatial point on the plane. Next, the HAX maps each point of the PAX to a hexadecimal digit by a hexagon grid. Finally, a hex string that can represent a time series is generated by the HAX. The experiment results show that the HAX has higher classification accuracy than the SAX, but one that is lower than most SAX variants, such as SAX-TD and SAX-BD. This is because these variants include some other information that may improve the distance measure of the SAX string. The HAX has the same space cost as the SAX and a lower space cost than the above-mentioned SAX variants. The PAX has higher classification accuracy than the HAX and is very close to the ED, but its space cost is 16 times that of the HAX. However, the space cost of the PAX is generally much less than the space cost of the ED. The HAX is a general time series representation method that can be extended similar to some SAX variants. Our future work will focus on the extension of HAX.

Author Contributions

Methodology, Z.H.; Project administration, Z.H.; Software, G.L.; Writing—original draft preparation, Z.H. and C.Z.; Writing—review & editing, X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (41972306, U1711267, 41572314).

Data Availability Statement

The source code and data have been made available at https://gitee.com/ZhenwenHe/series.git, accessed on 29 September 2021.

Acknowledgments

The work described in this paper was supported by the National Natural Science Foundation of China (41972306, U1711267, and 41572314). We thank the anonymous reviewers for their careful work and thoughtful suggestions that have helped improve this paper substantially.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. He, Z.; Kraak, M.-J.; Huisman, O.; Ma, X.; Xiao, J. Parallel indexing technique for spatio-temporal data. ISPRS J. Photogramm. Remote. Sens. 2013, 78, 116–128. [Google Scholar] [CrossRef]
  2. He, Z.; Wu, C.; Liu, G.; Zheng, Z.; Tian, Y. Decomposition tree: A spatio-temporal indexing method for movement big data. Clust. Comput. 2015, 18, 1481–1492. [Google Scholar] [CrossRef]
  3. He, Z.; Ma, X. A distributed indexing method for timeline similarity query. Algorithms 2018, 11, 41. [Google Scholar] [CrossRef] [Green Version]
  4. Hamilton, J.D. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 2020. [Google Scholar]
  5. Chatfield, C.; Xing, H. The Analysis of Time Series: An Introduction with R; Chapman and Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar]
  6. Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2018. [Google Scholar]
  7. Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.-C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR time series archive. IEEE/CAA J. Autom. Sin. 2019, 6, 1293–1305. [Google Scholar] [CrossRef]
  8. Kondylakis, H.; Dayan, N.; Zoumpatianos, K.; Palpanas, T. Coconut: A scalable bottom-up approach for building data series indexes. arXiv Prepr. 2020, arXiv:2006.13713. [Google Scholar]
  9. Oregi, I.; Pérez, A.; Del Ser, J.; Lozano, J.A. On-line elastic similarity measures for time series. Pattern Recognit. 2019, 88, 506–517. [Google Scholar] [CrossRef]
  10. Aghabozorgi, S.; Seyed Shirkhorshidi, A.; Ying Wah, T. Time-series clustering–A decade review. Inf. Syst. 2015, 53, 16–38. [Google Scholar] [CrossRef]
  11. Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.-A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef] [Green Version]
  12. Chan, K.-P.; Fu, A.W.-C. Efficient time series matching by wavelets. In Proceedings of the 15th International Conference on Data Engineering (Cat. No. 99CB36337), Sydney, Australia, 23–26 March 1999; pp. 126–133. [Google Scholar]
  13. Faloutsos, C.; Ranganathan, M.; Manolopoulos, Y.J.A.S.R. Fast subsequence matching in time-series databases. ACM Sigmod Rec. 1994, 23, 419–429. [Google Scholar] [CrossRef] [Green Version]
  14. Huang, Y.-W.; Yu, P.S. Adaptive query processing for time-series data. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 15–18 August 1999; pp. 282–286. [Google Scholar]
  15. Geurts, P. Pattern extraction for time series classification. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, Freiburg, Germany, 3–5 September 2001; pp. 115–127. [Google Scholar]
  16. Chakrabarti, K.; Keogh, E.; Mehrotra, S.; Pazzani, M. Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans. Database Syst. 2002, 27, 188–228. [Google Scholar] [CrossRef]
  17. Keogh, E.J.; Pazzani, M.J. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In Proceedings of the Knowledge Discovery in Databases, New York, NY, USA, 27–31 August 1998; pp. 239–243. [Google Scholar]
  18. Keogh, E.; Chakrabarti, K.; Pazzani, M.; Mehrotra, S. Dimensionality reduction for fast similarity search in large time series databases. Inf. Syst. 2001, 3, 263–286. [Google Scholar] [CrossRef]
  19. Lin, J.; Keogh, E.; Lonardi, S.; Chiu, B. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, CA, USA, 13 June 2003; pp. 2–11. [Google Scholar]
  20. Chen, L.; Ng, R. On the marriage of lp-norms and edit distance. In Proceedings of the Thirtieth International Conference on Very Large Data Bases-Volume 30, Toronto, ON, Canada, 31 August–3 September 2004; pp. 792–803. [Google Scholar]
  21. Chairunnanda, P.; Gopalkrishnan, V.; Chen, L. Enhancing edit distance on real sequences filters using histogram distance on fixed reference ordering. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; pp. 582–585. [Google Scholar]
  22. Lin, J.; Keogh, E.; Wei, L.; Lonardi, S. Experiencing SAX: A novel symbolic representation of time series. Data Min. Knowl. Discov. 2007, 15, 107–144. [Google Scholar] [CrossRef] [Green Version]
  23. Shieh, J.; Keogh, E. iSAX: Disk-aware mining and indexing of massive time series datasets. Data Min. Knowl. Discov. 2009, 19, 24–57. [Google Scholar] [CrossRef] [Green Version]
  24. Ye, L.; Keogh, E. Time series shapelets: A new primitive for data mining. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 947–956. [Google Scholar]
  25. Camerra, A.; Palpanas, T.; Shieh, J.; Keogh, E. iSAX 2.0: Indexing and mining one billion time series. In Proceedings of the 2010 IEEE International Conference on Data Mining, Washington, DC, USA, 13–17 December 2010; pp. 58–67. [Google Scholar]
  26. He, X.; Shao, C.; Xiong, Y. A new similarity measure based on shape information for invariant with multiple distortions. Neurocomputing 2014, 129, 556–569. [Google Scholar] [CrossRef]
  27. Bai, S.; Qi, H.-D.; Xiu, N. Constrained best Euclidean distance embedding on a sphere: A matrix optimization approach. SIAM J. Optim. 2015, 25, 439–467. [Google Scholar] [CrossRef] [Green Version]
  28. Hsu, C.-J.; Huang, K.-S.; Yang, C.-B.; Guo, Y.-P. Flexible dynamic time warping for time series classification. Procedia Comput. Sci. 2015, 51, 2838–2842. [Google Scholar] [CrossRef] [Green Version]
  29. Roggen, D.; Cuspinera, L.P.; Pombo, G.; Ali, F.; Nguyen-Dinh, L.-V. Limited-memory warping LCSS for real-time low-power pattern recognition in wireless nodes. In Proceedings of the European Conference on Wireless Sensor Networks, Porto, Portugal, 9–11 February 2015; pp. 151–167. [Google Scholar]
  30. Ares, J.; Lara, J.A.; Lizcano, D.; Suárez, S. A soft computing framework for classifying time series based on fuzzy sets of events. Inf. Sci. 2016, 330, 125–144. [Google Scholar] [CrossRef]
  31. Zoumpatianos, K.; Idreos, S.; Palpanas, T. ADS: The adaptive data series index. Very Large Data Bases J. 2016, 25, 843–866. [Google Scholar] [CrossRef]
  32. Yagoubi, D.E.; Akbarinia, R.; Masseglia, F.; Palpanas, T. Dpisax: Massively distributed partitioned isax. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA, 18–21 November 2017; pp. 1135–1140. [Google Scholar]
  33. Yagoubi, D.E.; Akbarinia, R.; Masseglia, F.; Shasha, D. RadiusSketch: Massively distributed indexing of time series. In Proceedings of the 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Tokyo, Japan, 19–21 October 2017; pp. 262–271. [Google Scholar]
  34. Baldán, F.J.; Benítez, J.M. Distributed FastShapelet Transform: A Big Data time series classification algorithm. Inf. Sci. 2019, 496, 451–463. [Google Scholar] [CrossRef]
  35. Yagoubi, D.; Akbarinia, R.; Masseglia, F.; Palpanas, T. Massively distributed time series indexing and querying. IEEE Trans. Knowl. Data Eng. 2020, 32, 108–120. [Google Scholar] [CrossRef] [Green Version]
  36. Serra, J.; Kantz, H.; Serra, X.; Andrzejak, R.G. Predictability of music descriptor time series and its application to cover song detection. IEEE Trans. Audio Speech Lang. Process. 2011, 20, 514–525. [Google Scholar] [CrossRef] [Green Version]
  37. Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  38. Agrawal, R.; Faloutsos, C.; Swami, A. Efficient similarity search in sequence databases. In Proceedings of the Foundations of Data Organization and Algorithms, Berlin/Heidelberg, Germany, 21–23 June 1993; pp. 69–84. [Google Scholar]
  39. Kawagoe, K.; Ueda, T. A similarity search method of time series data with combination of Fourier and wavelet transforms. In Proceedings of the Ninth International Symposium on Temporal Representation and Reasoning, Manchester, UK, 7–9 July 2002; pp. 86–92. [Google Scholar]
  40. Korn, F.; Jagadish, H.V.; Faloutsos, C. Efficiently supporting ad hoc queries in large datasets of time sequences. ACM Sigmod Rec. 1997, 26, 289–300. [Google Scholar] [CrossRef]
  41. Azzouzi, M.; Nabney, I.T. Analysing time series structure with hidden Markov models. In Proceedings of the Neural Networks for Signal Processing VIII. Proceedings of the 1998 IEEE Signal Processing Society Workshop (Cat. No. 98TH8378), Cambridge, UK, 2 September 1998; pp. 402–408. [Google Scholar]
  42. Yi, B.-K.; Faloutsos, C. Fast time sequence indexing for arbitrary Lp norms. In Proceedings of the Very Large Data Bases, Cairo, Egypt, 10–14 September 2000. [Google Scholar]
  43. Keogh, E.J.; Pazzani, M.J. A simple dimensionality reduction technique for fast similarity search in large time series databases. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan, 18–20 April 2000; pp. 122–133. [Google Scholar]
  44. Chung, F.L.K.; Fu, T.-C.; Luk, W.P.R.; Ng, V.T.Y. Flexible time series pattern matching based on perceptually important points. In Proceedings of the Workshop on Learning from Temporal and Spatial Data in International Joint Conference on Artificial Intelligence, Washington, DC, USA, 4–10 August 2001. [Google Scholar]
  45. Cai, Y.; Ng, R. Indexing spatio-temporal trajectories with Chebyshev polynomials. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, Paris France, 13–18 June 2004; pp. 599–610. [Google Scholar]
  46. Keogh, E.; Lin, J.; Fu, A. Hot SAX: Efficiently finding the most unusual time series subsequence. In Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), Washington, DC, USA, 27–30 November 2005; p. 8. [Google Scholar]
  47. Ratanamahatana, C.; Keogh, E.; Bagnall, A.J.; Lonardi, S. A novel bit level time series representation with implication of similarity search and clustering. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Hanoi Vietnam, 18–20 May 2005; pp. 771–777. [Google Scholar]
  48. Lin, J.; Keogh, E. Group SAX: Extending the notion of contrast sets to time series and multimedia data. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, Berlin, Germany, 18–22 September 2006; pp. 284–296. [Google Scholar]
  49. Lkhagva, B.; Suzuki, Y.; Kawagoe, K. Extended SAX: Extension of symbolic aggregate approximation for financial time series data representation. DEWS2006 4A-i8 2006, 7. Available online: https://www.ieice.org/~de/DEWS/DEWS2006/doc/4A-i8.pdf (accessed on 29 September 2021).
  50. Hung, N.Q.V.; Anh, D.T. Combining SAX and piecewise linear approximation to improve similarity search on financial time series. In Proceedings of the 2007 International Symposium on Information Technology Convergence (ISITC’2007), Washington, DC, USA, 23–24 November 2007; pp. 58–62. [Google Scholar]
  51. Chen, Q.; Chen, L.; Lian, X.; Liu, Y.; Yu, J.X. Indexable PLA for efficient similarity search. In Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna Austria, 23–27 September 2007; pp. 435–446. [Google Scholar]
  52. Malinowski, S.; Guyet, T.; Quiniou, R.; Tavenard, R. 1d-SAX: A novel symbolic representation for time series. In Proceedings of the International Symposium on Intelligent Data Analysis, London UK, 17–19 October 2013; pp. 273–284. [Google Scholar]
  53. Stefan, A.; Athitsos, V.; Das, G. The move-split-merge metric for time series. IEEE Trans. Knowl. Data Eng. 2013, 25, 1425–1438. [Google Scholar] [CrossRef] [Green Version]
  54. Senin, P.; Malinchik, S. SAX-VSM: Interpretable time series classification using SAX and vector space model. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 7–10 December 2013; pp. 1175–1180. [Google Scholar]
  55. Kamath, U.; Lin, J.; Jong, K.D. SAX-EFG: An evolutionary feature generation framework for time series classification. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, Vancouver, BC, Canada, 12–16 July 2014; pp. 533–540. [Google Scholar]
  56. Baydogan, M.G.; Runger, G. Learning a symbolic representation for multivariate time series classification. Data Min. Knowl. Discov. 2014, 29, 400–422. [Google Scholar] [CrossRef]
  57. Zhang, Z.; Tang, P.; Duan, R. Dynamic time warping under pointwise shape context. Inf. Sci. 2015, 315, 88–101. [Google Scholar] [CrossRef]
  58. Baydogan, M.G.; Runger, G. Time series representation and similarity based on local autopatterns. Data Min. Knowl. Discov. 2015, 30, 476–509. [Google Scholar] [CrossRef]
  59. Ye, Y.; Jiang, J.; Ge, B.; Dou, Y.; Yang, K. Similarity measures for time series data classification using grid representation and matrix distance. Knowl. Inf. Syst. 2018, 60, 1105–1134. [Google Scholar] [CrossRef]
  60. Ruta, N.; Sawada, N.; McKeough, K.; Behrisch, M.; Beyer, J. SAX navigator: Time series exploration through hierarchical clustering. In Proceedings of the 2019 IEEE Visualization Conference (VIS), Vancouver, BC, Canada, 20–25 October 2019; pp. 236–240. [Google Scholar]
  61. Park, H.; Jung, J.-Y. SAX-ARM: Deviant event pattern discovery from multivariate time series using symbolic aggregate approximation and association rule mining. Expert Syst. Appl. 2020, 141, 112950. [Google Scholar] [CrossRef]
  62. He, Z.; Long, S.; Ma, X.; Zhao, H. A boundary distance-based symbolic aggregate approximation method for time series data. Algorithms 2020, 13, 284. [Google Scholar] [CrossRef]
  63. Bountrogiannis, K.; Tzagkarakis, G.; Tsakalides, P. Data-driven kernel-based probabilistic SAX for time series dimensionality reduction. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 24–28 August 2021; pp. 2343–2347. [Google Scholar]
  64. Sant’Anna, A.; Wickstrom, N. Symbolization of time-series: An evaluation of SAX, Persist, and ACA. In Proceedings of the 4th International Congress on Image and Signal Processing, CISP 2011, October 15, Shanghai, China, 15–17 October 2011; pp. 2223–2228. [Google Scholar]
  65. Camerra, A.; Shieh, J.; Palpanas, T.; Rakthanmanon, T.; Keogh, E. Beyond one billion time series: Indexing and mining very large time series collections with iSAX2+. Knowl. Inf. Syst. 2013, 39, 123–151. [Google Scholar] [CrossRef]
  66. Sun, Y.; Li, J.; Liu, J.; Sun, B.; Chow, C. An improvement of symbolic aggregate approximation distance measure for time series. Neurocomputing 2014, 138, 189–198. [Google Scholar] [CrossRef]
  67. Lkhagva, B.; Yu, S.; Kawagoe, K. New time series data representation ESAX for financial applications. In Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDEW’06), Atlanta, GA, USA, 3–7 April 2006; p. x115. [Google Scholar]
  68. Altman, N.S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 1992, 46, 175–185. [Google Scholar]
  69. Abanda, A.; Mori, U.; Lozano, J.A. A review on distance based time series classification. Data Min. Knowl. Discov. 2019, 33, 378–412. [Google Scholar] [CrossRef] [Green Version]
Figure 1. The relationship between PAA and SAX ([5]).
Figure 1. The relationship between PAA and SAX ([5]).
Algorithms 14 00353 g001
Figure 2. Summary of time series 1 and 2. (a) Summary of time series 1 (b) Summary of time series 2.
Figure 2. Summary of time series 1 and 2. (a) Summary of time series 1 (b) Summary of time series 2.
Algorithms 14 00353 g002aAlgorithms 14 00353 g002b
Figure 3. The transformations distance between AB and CD. (a) translation transformation; (b) rotation transformation; (c) scale transformation; (d) AB = CD.
Figure 3. The transformations distance between AB and CD. (a) translation transformation; (b) rotation transformation; (c) scale transformation; (d) AB = CD.
Algorithms 14 00353 g003
Figure 4. Change the order of the transformation distance calculation between AB and CD. (a) rotation transformation; (b) translation transformation; (c) scale transformation; (d) AB = CD.
Figure 4. Change the order of the transformation distance calculation between AB and CD. (a) rotation transformation; (b) translation transformation; (c) scale transformation; (d) AB = CD.
Algorithms 14 00353 g004
Figure 5. Transformable interval objects AB and CD.
Figure 5. Transformable interval objects AB and CD.
Algorithms 14 00353 g005
Figure 6. TIO points on the TIO plane.
Figure 6. TIO points on the TIO plane.
Algorithms 14 00353 g006
Figure 7. A 4 × 4 HAX grid plane.
Figure 7. A 4 × 4 HAX grid plane.
Algorithms 14 00353 g007
Figure 8. Accuracy comparison plot for Table 4.
Figure 8. Accuracy comparison plot for Table 4.
Algorithms 14 00353 g008
Figure 9. Accuracy comparison between HAX and SAX (71 points in the upper triangle and 43 points in the lower triangle).
Figure 9. Accuracy comparison between HAX and SAX (71 points in the upper triangle and 43 points in the lower triangle).
Algorithms 14 00353 g009
Figure 10. Accuracy comparison between PAX and SAX (111 points in the upper triangle and 3 points in the lower triangle).
Figure 10. Accuracy comparison between PAX and SAX (111 points in the upper triangle and 3 points in the lower triangle).
Algorithms 14 00353 g010
Figure 11. Accuracy comparison among HAX, ED and SAX.
Figure 11. Accuracy comparison among HAX, ED and SAX.
Algorithms 14 00353 g011
Figure 12. Accuracy comparison among PAX, ED and SAX.
Figure 12. Accuracy comparison among PAX, ED and SAX.
Algorithms 14 00353 g012
Figure 13. Accuracy comparison between PAX and ED (48 points in the upper triangle and 63 points in the lower triangle).
Figure 13. Accuracy comparison between PAX and ED (48 points in the upper triangle and 63 points in the lower triangle).
Algorithms 14 00353 g013
Figure 14. Accuracy comparison between PAX and SAX-BD (50 points in the upper triangle and 54 points in the lower triangle).
Figure 14. Accuracy comparison between PAX and SAX-BD (50 points in the upper triangle and 54 points in the lower triangle).
Algorithms 14 00353 g014
Table 1. Main representation methods for time series data (— no indicated by authors; n is the length of time series; T1 non-data adaptive; T2 data-adaptive; T3 data dictated; T4 model-based).
Table 1. Main representation methods for time series data (— no indicated by authors; n is the length of time series; T1 non-data adaptive; T2 data-adaptive; T3 data dictated; T4 model-based).
Representation MethodYearTypeComplexityReferences
Auto-regressive (AR) model 1971T4 [36,37]
Discrete Fourier Transform(DFT)1993T1O(n(log(n)))[13,38]
Discrete Wavelet Transform (DWT)1999T1O(n)[12,39]
Singular Value Decomposition (SVD)1997T2 [40]
Discrete Cosine Transformation (DCT)1997T1[40]
Piecewise Linear Approximation (PLA)1998T2O(n(log(n)))[17]
Hidden Markov models (HMMs)1998T4[41]
Piecewise Aggregate Approximation (PAA) or Segmented Means2000T1O(n)[42]
Piecewise Constant Approximation (PCA)2000T2[43]
Adaptive Piecewise Constant Approximation (APCA)2002T2O(n)[16]
Perceptually important point (PIP)2001T1[44]
Chebyshev Polynomials (CHEB)2004T1[45]
Symbolic Aggregate Approximation (SAX)2003T2O(n)[19,22]
HOT SAX2005 T2 [46]
Clipped Data2005T3[47]
Group SAX2006T2 [48]
Extended SAX2006T2 [49]
Combining SAX and Piecewise Linear Approximation2007T2 [50]
Indexable Piecewise Linear Approximation (IPLA)2007T1[51]
1d-SAX2013T2 [52]
Move-Split-Merge (MSM)2013 [53]
SAX-VSM2013T2 [54]
SAX-EFG2014T2 [55]
Tree-based Representations2015 [56]
SC-DTW2015T1 [57]
Representation based on Local Autopatterns2016 [58]
Grid Representation2019 [59]
SAX Navigator2019T2 [60]
SAX-ARM2020T2 [61]
SAX-BD2020T2 [62]
Data-driven Kernel-based Probabilistic SAX2021T2 [63]
Table 2. A summarization of the notations.
Table 2. A summarization of the notations.
TA time series
T = v1, v2, …, vn
SA piecewise aggregate approximation of a time series
S = s1, s2, …, sw
PA point set aggregate approximation of a time series
P = p1, p2, …, pw
HA hexadecimal digit representation of a time series
H = h1, h2, …, hw
wThe number of PAA segments representing time series T
nThe arbitrary length of time series T
t(i)ith time point
Window(i)A time window between (i − 1)th and ith time points
Subseries(i)A subseries within Window(i)
Segment(i)A fitting segment for Subseries(i)
TIO(i) or TIOABA transformable interval object for Segment(i); point A is the starting point and B is the endpoint for Segment(i).
Table 3. A list of test data sets.
Table 3. A list of test data sets.
IDTypeNameTrainTestClassLength
1DeviceACSF1100100101460
2ImageAdiac39039137176
3ImageArrowHead361753251
4SpectroBeef30305470
5ImageBeetleFly20202512
6ImageBirdChicken20202512
7SimulatedBME301503128
8SensorCar60604577
9SimulatedCBF309003128
10TrafficChinatown20343224
11SensorChlorineConcentration46738403166
12SensorCinCECGTorso40138041639
13SpectroCoffee28282286
14DeviceComputers2502502720
15MotionCricketX39039012300
16MotionCricketY39039012300
17MotionCricketZ39039012300
18ImageCrop720016,8002446
19ImageDiatomSizeReduction163064345
20ImageDistalPhalanxOutlineAgeGroup400139380
21ImageDistalPhalanxOutlineCorrect600276280
22ImageDistalPhalanxTW400139680
23SensorEarthquakes3221392512
24ECGECG200100100296
25ECGECG500050045005140
26ECGECGFiveDays238612136
27DeviceElectricDevices89267711796
28EOGEOGHorizontalSignal362362121250
29EOGEOGVerticalSignal362362121250
30SpectroEthanolLevel50450041751
31ImageFaceAll560169014131
32ImageFaceFour24884350
33ImageFacesUCR200205014131
34ImageFiftyWords45045550270
35ImageFish1751757463
36SensorFordA360113202500
37SensorFordB36368102500
38SensorFreezerRegularTrain15028502301
39SensorFreezerSmallTrain2828502301
40HRMFungi1818618201
41MotionGunPoint501502150
42MotionGunPointAgeSpan1353162150
43MotionGunPointMaleVersusFemale1353162150
44MotionGunPointOldVersusYoung1363152150
45SpectroHam1091052431
46ImageHandOutlines100037022709
47MotionHaptics15530851092
48ImageHerring64642512
49DeviceHouseTwenty4011922000
50MotionInlineSkate10055071882
51EPGInsectEPGRegularTrain622493601
52EPGInsectEPGSmallTrain172493601
53SensorInsectWingbeatSound220198011256
54SensorItalyPowerDemand671029224
55DeviceLargeKitchenAppliances3753753720
56SensorLightning260612637
57SensorLightning770737319
58SimulatedMallat55234581024
59SpectroMeat60603448
60ImageMedicalImages3817601099
61TrafficMelbournePedestrian119424391024
62ImageMiddlePhalanxOutlineAgeGroup400154380
63ImageMiddlePhalanxOutlineCorrect600291280
64ImageMiddlePhalanxTW399154680
65ImageMixedShapesRegularTrain500242551024
66ImageMixedShapesSmallTrain100242551024
67SensorMoteStrain201252284
68ECGNonInvasiveFetalECGThorax11800196542750
69ECGNonInvasiveFetalECGThorax21800196542750
70SpectroOliveOil30304570
71ImageOSULeaf2002426427
72ImagePhalangesOutlinesCorrect1800858280
73SensorPhoneme2141896391024
74HemodynamicsPigAirwayPressure104208522000
75HemodynamicsPigArtPressure104208522000
76HemodynamicsPigCVP104208522000
77SensorPlane1051057144
78PowerPowerCons1801802144
79ImageProximalPhalanxOutlineAgeGroup400205380
80ImageProximalPhalanxOutlineCorrect600291280
81ImageProximalPhalanxTW400205680
82DeviceRefrigerationDevices3753753720
83SpectrumRock205042844
84DeviceScreenType3753753720
85SpectrumSemgHandGenderCh230060021500
86SpectrumSemgHandMovementCh245045061500
87SpectrumSemgHandSubjectCh245045051500
88SimulatedShapeletSim201802500
89ImageShapesAll60060060512
90DeviceSmallKitchenAppliances3753753720
91SimulatedSmoothSubspace150150315
92SensorSonyAIBORobotSurface120601270
93SensorSonyAIBORobotSurface227953265
94SensorStarLightCurves1000823631024
95SpectroStrawberry6133702235
96ImageSwedishLeaf50062515128
97ImageSymbols259956398
98SimulatedSyntheticControl300300660
99MotionToeSegmentation1402282277
100MotionToeSegmentation2361302343
101SensorTrace1001004275
102ECGTwoLeadECG231139282
103SimulatedTwoPatterns100040004128
104SimulatedUMD361443150
105MotionUWaveGestureLibraryAll89635828945
106MotionUWaveGestureLibraryX89635828315
107MotionUWaveGestureLibraryY89635828315
108MotionUWaveGestureLibraryZ89635828315
109SensorWafer100061642152
110SpectroWine57542234
111ImageWordSynonyms26763825270
112MotionWorms181775900
113MotionWormsTwoClass181772900
114ImageYoga30030002426
Table 4. Classification accuracy on UCRArchive2018.
Table 4. Classification accuracy on UCRArchive2018.
IDEDSAXSAX-TDSAX-BDPAXHAX
10.540.130.630.600.380.23
20.610.080.590.740.470.15
30.800.520.750.840.730.56
40.830.700.810.800.840.88
50.670.400.580.900.600.51
60.750.720.750.800.740.75
70.550.570.590.940.630.58
80.850.840.880.880.960.71
90.730.490.700.970.670.53
100.950.760.930.960.810.70
110.650.420.540.940.580.46
120.900.660.751.000.790.71
131.000.510.950.620.900.62
140.580.510.530.670.520.57
150.580.430.550.630.590.27
160.570.430.520.680.610.26
170.590.440.560.970.620.33
180.710.280.680.730.700.34
190.930.240.950.750.910.67
200.630.530.660.630.680.62
210.720.570.710.710.710.63
220.630.420.580.910.600.54
230.880.800.880.880.870.80
240.920.870.920.400.920.89
250.800.680.820.400.800.68
260.420.290.360.300.410.21
270.440.300.400.790.340.23
280.710.660.680.880.650.66
290.550.430.570.830.580.48
300.270.250.280.680.280.27
310.710.350.720.830.690.35
320.780.530.720.690.800.69
330.770.400.650.610.740.39
340.630.540.630.860.660.49
350.780.250.700.960.680.24
360.670.510.570.940.570.53
370.610.510.520.990.520.51
380.800.660.881.000.910.63
390.680.670.690.680.700.67
400.820.540.800.880.880.46
410.910.720.870.430.920.75
420.900.650.910.630.980.83
430.970.650.990.800.990.87
440.950.641.000.351.001.00
450.600.540.590.780.580.58
460.860.620.850.680.820.75
470.370.290.350.580.350.31
480.520.520.530.950.540.53
490.660.670.690.580.640.61
500.340.250.290.850.330.25
510.680.410.670.731.001.00
520.660.200.590.931.001.00
530.560.430.530.680.540.45
540.960.820.950.910.950.89
550.490.420.490.530.530.38
560.750.690.740.740.780.60
570.580.500.560.520.660.37
580.910.390.830.880.900.54
590.930.330.910.820.910.46
600.680.510.670.880.690.48
610.850.430.920.900.820.41
620.520.360.490.560.500.42
630.770.530.720.770.730.61
640.510.290.510.120.530.41
650.900.790.860.180.870.76
660.840.740.800.350.810.71
670.880.750.820.140.840.75
680.830.130.721.000.730.17
690.880.150.800.970.770.20
700.520.450.500.820.500.40
710.870.300.850.870.810.31
720.760.560.720.760.720.62
730.110.060.070.480.090.06
740.060.050.080.860.120.06
750.130.020.110.450.220.11
760.080.040.050.950.140.06
770.960.730.960.790.960.87
780.930.810.910.880.970.87
790.790.480.780.640.770.64
800.810.570.760.770.740.64
810.710.360.700.640.700.58
820.390.360.380.940.390.35
830.840.460.720.760.540.68
840.360.380.370.860.390.37
850.760.550.630.960.800.56
860.370.250.330.880.600.22
870.400.330.370.910.700.31
880.540.500.500.950.490.50
890.750.530.710.750.720.53
900.340.440.580.880.580.53
910.910.520.841.000.970.85
920.700.640.660.940.740.64
930.860.780.840.950.840.79
940.850.800.870.990.880.84
950.950.570.931.000.920.76
960.790.380.740.570.760.37
970.900.760.880.620.890.81
980.880.870.890.560.980.66
990.680.630.640.690.680.60
1000.810.810.830.830.850.74
1010.760.490.660.860.760.59
1020.750.590.770.730.700.65
1030.910.780.880.830.910.51
1040.760.640.770.790.780.68
1050.950.810.920.880.920.72
1060.740.660.720.710.730.61
1070.660.580.650.650.670.51
1080.650.590.640.650.650.55
1091.000.980.990.990.990.98
1100.610.500.550.550.610.51
1110.620.510.590.610.630.47
1120.450.470.500.50.520.40
1130.610.590.600.610.620.54
1140.830.670.800.780.810.69
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

He, Z.; Zhang, C.; Ma, X.; Liu, G. Hexadecimal Aggregate Approximation Representation and Classification of Time Series Data. Algorithms 2021, 14, 353. https://doi.org/10.3390/a14120353

AMA Style

He Z, Zhang C, Ma X, Liu G. Hexadecimal Aggregate Approximation Representation and Classification of Time Series Data. Algorithms. 2021; 14(12):353. https://doi.org/10.3390/a14120353

Chicago/Turabian Style

He, Zhenwen, Chunfeng Zhang, Xiaogang Ma, and Gang Liu. 2021. "Hexadecimal Aggregate Approximation Representation and Classification of Time Series Data" Algorithms 14, no. 12: 353. https://doi.org/10.3390/a14120353

APA Style

He, Z., Zhang, C., Ma, X., & Liu, G. (2021). Hexadecimal Aggregate Approximation Representation and Classification of Time Series Data. Algorithms, 14(12), 353. https://doi.org/10.3390/a14120353

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop