[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Accelerating k-Shape Time Series Clustering Algorithm Using GPU

Published: 01 October 2023 Publication History

Abstract

In the data space, time-series analysis has emerged in many fields, including biology, healthcare, and numerous large-scale scientific facilities like astronomy, climate science, particle physics, and genomics. Clustering is one of the most critical methods in time-series analysis. So far, the state-of-art time series clustering algorithm k-Shape has been widely used not only because of its high accuracy, but also because of its relatively low computation cost. However, due to the high heterogeneity of time series data, it can not be simply regarded as a high-dimensional vector. Two time series often need some alignment method in similarity comparison. The alignment between sequences is often a time-consuming process. For example, when using dynamic time warping as a sequence alignment algorithm and if the length of time series is greater than 1,000, a single iteration in the clustering process may take hundreds to tens of thousands of seconds, while the entire clustering cycle often requires dozens of iterations. In this article, we propose a set of novel parallel strategies suitable for GPU's computation model, called Times-C, which is an abbreviation for Time Series Clustering. We define three stages in the analysis process: aggregation, centroid, and class assignment. Times-C includes efficient parallel algorithms and corresponding implementations for these three stages. Overall, the experimental results show that the Times-C algorithm exhibits a performance improvement of one to two orders of magnitude compared to the multi-core CPU version of k-Shape. Furthermore, compared to the GPU version of the k-Shape algorithm, the Times-C algorithm achieves a maximum acceleration of up to 345 times.

References

[1]
R. H. Shumway, D. S. Stoffer, and D. S. Stoffer, Time Series Analysis and its Applications, New York, NY, USA: Springer, 2000.
[2]
Z. Bar-Joseph et al., “A new approach to analyzing gene expression time series data,” in Proc. 6th Annu. Int. Conf. Comput. Biol., 2002, pp. 39–48.
[3]
E. J. Ruiz et al., “Correlating financial time series with micro-blogging activity,” in Proc. 5th ACM Int. Conf. Web Search Data Mining, 2012, pp. 513–522.
[4]
K. Uehara and M. Shimada, “Extraction of primitive motion and discovery of association rules from human motion data,” in Progress in Discovery Science, Berlin, Germany: Springer, 2002, pp. 338–348.
[5]
T. W. Liao, “Clustering of time series data–A survey,” Pattern Recognit., vol. 38, no. 11, pp. 1857–1874, 2005.
[6]
J. Yang et al., “k-shape clustering algorithm for building energy usage patterns analysis and forecasting model accuracy improvement,” Energy Buildings, vol. 146, pp. 27–37, 2017.
[7]
J. Thalheim et al., “Sieve: Actionable insights from monitored metrics in distributed systems,” in Proc. 18th ACM/IFIP/USENIX Middleware Conf., 2017, pp. 14–27.
[8]
T. Jarábek, P. Laurinec, and M. Lucká, “Energy load forecast using S2S deep neural networks with k-shape clustering,” in Proc. IEEE 14th Int. Sci. Conf. Informat., 2017, pp. 140–145.
[9]
G. Bello-Orgaz et al., “Marketing analysis of wineries using social collective behavior from users’ temporal activity on Twitter’ temporal activity on Twitter,” Inform. Process. Manag., vol. 57, no. 5, 2020, Art. no.
[10]
Z. Karevan and J. A. K. Suykens, “Transductive LSTM for time-series prediction: An application to weather forecasting,” Neural Netw., vol. 125, pp. 1–9, 2020.
[11]
J. Paparrizos and L. Gravano, “k-shape: Efficient and accurate clustering of time series,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2015, pp. 1855–1870.
[12]
C. W. Tan, G. I. Webb, and F. Petitjean, “Indexing and classifying gigabytes of time series under time warping,” in Proc. SIAM Int. Conf. Data Mining, 2017, pp. 282–290.
[13]
P. Huijse, P. A. Estevez, P. Protopapas, J. C. Principe, and P. Zegers, “Computational intelligence challenges and applications on large-scale astronomical time series databases,” IEEE Comput. Intell. Mag., vol. 9, no. 3, pp. 27–39, Aug. 2014.
[14]
J. MacQueen, “Classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symp. Math. Statist. Probability, 1967, pp. 281–297.
[15]
S. Srikanthan, A. Kumar, and R. Gupta, “Implementing the dynamic time warping algorithm in multithreaded environments for real time and unsupervised pattern discovery,” in Proc. Int. Conf. Comput. Commun. Technol., 2011, pp. 394–398.
[16]
N. Takahashi, T. Yoshihisa, and Y. Sakurai, “A parallelized data stream processing system using dynamic time warping distance,” in Proc. Adv. Intell. Syst. Comput., 2009, pp. 1100–1105.
[17]
H. Zhu et al., “Developing a pattern discovery method in time series data and its GPU acceleration,” Big Data Mining Anal., vol. 1, no. 4, pp. 266–283, 2018.
[18]
Y. Zhang, K. Adl, and J. Glass, “Fast spoken query detection using lower-bound dynamic time warping on graphical processing units,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2012, pp. 5173–5176.
[19]
M. S. Nobile et al., “Graphics processing units in bioinformatics, computational biology and systems biology,” Brief. Bioinf., vol. 18, no. 5, pp. 870–885, 2017.
[20]
S. Peng and S. X. D. Tan, “GLU3.0: Fast GPU-based parallel sparse LU factorization for circuit simulation,” IEEE Des. Test, vol. 37, no. 3, pp. 78–90, Jun. 2020.
[21]
G. Pratx and L. Xing, “GPU computing in medical physics: A review,” Med. Phys., vol. 38, no. 5, pp. 2685–2697, 2011.
[22]
C. Ma, L. Wang, and X. Q. Xie, “GPU accelerated chemical similarity calculation for compound library comparison,” J. Chem. Inf. Model., vol. 51, no. 7, pp. 1521–1527, 2011.
[23]
Y. Katznelson, An Introduction to Harmonic Analysis. Cambridge, U.K.: Cambridge Univ. Press, 2004.
[24]
S. Chetlur et al., “cuDNN: Efficient primitives for deep learning,” 2014,.
[25]
G. Lu, W. Zhang, and Z. Wang, “Optimizing depthwise separable convolution operations on GPUs,” IEEE Trans. Parallel Distrib. Syst., vol. 31, no. 1, pp. 70–87, Jan. 2022.
[26]
S. Popov, J. Günther, and H. P. Seidel, “Stackless KD-tree traversal for high performance GPU ray tracing,” Comput. Graph. Forum, vol. 26, no. 3, pp. 415–424, 2007.
[27]
D. J. Berndt and J. Clifford, “Using dynamic time warping to find patterns in time series,” in Proc. 3rd Int. Conf. Knowl. Discov. Data Mining, 1994, pp. 359–370.
[28]
R. N. Bracewell and R. N. Bracewell, The Fourier Transform and its Applications, New York, NY, USA: McGraw-Hill, 1986.
[29]
J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Math. Comput., vol. 19, no. 90, pp. 297–301, 1965.
[30]
G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore, MD, USA: JHU Press, 2013.
[31]
G. E. Blelloch, “Scans as primitive parallel operations,” IEEE Trans. Comput., vol. 38, no. 11, pp. 1526–1538, Nov. 1989.
[32]
NVIDIA, thrust, 2023. [Online]. Available: https://docs.nvidia.com/cuda/thrust/index.html
[33]
R. L. Burden, J. D. Faires, and A. M. Burden, Numerical Analysis. Boston, MA, USA: Cengage Learning, 2015.
[34]
NVIDIA, cuSOLVER, 2023. [Online]. Available: https://docs.nvidia.com/cuda/cuSOLVER/index.html
[35]
NVIDIA, cuBLAS, 2023. [Online]. Available: https://docs.nvidia.com/cuda/cublas/index.html
[36]
NVIDIA, cuFFT, 2023. [Online]. Available: https://docs.nvidia.com/cuda/cufft/index.html
[37]
A. P. Ruiz et al., “The great multivariate time series classification bake off: A review and experimental evaluation of recent algorithmic advances,” Data Min. Knowl. Discov., vol. 35, no. 2, pp. 401–449, 2021.
[38]
H. A. Dau et al., “The UCR time series archive,” IEEE/CAA J. Automatica Sinica, vol. 6, no. 6, pp. 1293–1305, Nov. 2019.
[39]
J. Salamon, C. Jacoby, and J. P. Bello, “A dataset and taxonomy for urban sound research,” in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 1041–1044.
[40]
R. Tavenard et al., “Tslearn, A machine learning toolkit for time series data,” J. Mach. Learn. Res., vol. 21, no. 118, pp. 1–6, 2020.
[41]
W. M. Rand, “Objective criteria for the evaluation of clustering methods,” J. Amer. Statist. Assoc., vol. 66, no. 336, pp. 846–850, 1971.
[42]
B. McFee et al., “librosa: Audio and music signal analysis in Python,” in Proc. 14th. Python Sci. Conf., 2015, pp. 18–25.
[43]
L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken, NJ, USA: Wiley, 2009.
[44]
F. Petitjean, A. Ketterlin, and P. Gançarski, “A global averaging method for dynamic time warping, with applications to clustering,” Pattern Recognit., vol. 44, no. 3, pp. 678–693, 2011.
[45]
J. Paparrizos and L. Gravano, “k-Shape source code,” 2023. [Online]. Available: https://github.com/TheDatumOrg/kshape-python
[46]
S. Aghabozorgi, A. S. Shirkhorshidi, and T. Y. Wah, “Time-series clustering–A decade review,” Inf. Syst., vol. 53, pp. 16–38, 2015.
[47]
A. S. Shirkhorshidi, S. Aghabozorgi, and T. Y. Wah, “A comparison study on similarity and dissimilarity measures in clustering continuous data,” PLoS One, vol. 10, no. 12, 2015, Art. no.
[48]
P. Esling and C. Agon, “Time-series data mining,” ACM Comput. Surv., vol. 45, no. 1, pp. 1–34, 2012.

Cited By

View all
  • (2024)Exploring Efficient Partial Differential Equation Solution Using Speed Galerkin TransformerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00084(1-14)Online publication date: 17-Nov-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 34, Issue 10
Oct. 2023
228 pages

Publisher

IEEE Press

Publication History

Published: 01 October 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring Efficient Partial Differential Equation Solution Using Speed Galerkin TransformerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00084(1-14)Online publication date: 17-Nov-2024

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media