Overcoming Weak Scaling Challenges in Tree-Based Nearest Neighbor Time Series Mining

Amir Raoofy^11,14,
Roman Karlstetter^12,14,
Martin Schreiber^13,14,
Carsten Trinitis¹⁴ &
…
Martin Schulz¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13948))

Included in the following conference series:

International Conference on High Performance Computing

1490 Accesses

Abstract

The mining of time series data plays an important role in modern information retrieval and monitoring infrastructures. In particular, the identification of similarities within and across large time series is of great importance in analytics and knowledge discovery. For this task, the matrix profile similarity indexing approach, which encodes the correlations among snapshots of a time series, is well-established. However, it is computationally expensive, especially for long time series, as existing exact approaches mostly rely on exhaustive, exact query (search) operations and are inefficient. Similarly, existing approximate approaches are limited with respect to parallelism, scalability, or their extent of practicality. We, therefore, focus on an approximate parallel tree-based nearest-neighbors approach and address the weak scaling challenges raised when applied to large time series in HPC settings.

We build on the existing concept of parallel iterative tree-based nearest neighbor solvers and introduce a novel approach for the approximate calculation of the matrix profile. To improve the performance and overcome weak scalability challenges, we exploit a mix of creating a forest of parallel trees on exclusive ensembles of resources combined with pipelining of iterations. We provide an implementation targeting large-scale CPU-based HPC systems and illustrate the performance of this new approach with experimental data. Finally, we demonstrate the mining of time series at billion-records-scale datasets on the SuperMUC-NG system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 47.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 59.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

BestNeighbor: efficient evaluation of kNN queries on large time series databases

Article 16 November 2020

Time Series Mining at Petascale Performance

PARROT: pattern-based correlation exploitation in big partitioned data series

Article 13 October 2022

Notes

1.
The implementation can be provided upon request.
2.
Leibniz Supercomputing Centre.
3.
Other metrics are also used in the literature, and some might be subjective to particular practical use cases. An investigation of such alternative metrics is beyond the scope of the present work.

References

Arya, S., et al.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)
Article MathSciNet MATH Google Scholar
Barnes, J., Hut, P.: A hierarchical O(N log N) force-calculation algorithm. Nature 324(6096), 446–449 (1986)
Article Google Scholar
Cools, S., et al.: Improving strong scaling of the conjugate gradient method for solving large linear systems using global reduction pipelining. ArXiv abs/1905.06850 (2019)
Google Scholar
Curtin, R.R.: Faster dual-tree traversal for nearest neighbor search. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds.) SISAP 2015. LNCS, vol. 9371, pp. 77–89. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25087-8_7
Chapter Google Scholar
Dau, H.A., Keogh, E.: Matrix profile V: a generic technique to incorporate domain knowledge into motif discovery. In: 23rd ACM SIGKDD, pp. 125–134 (2017)
Google Scholar
Eamonn Keogh: Electrocardiography Dataset. https://www.cs.ucr.edu/~eamonn/ECG_one_day.zip. Accessed 15 Aug 2022
Gharghabi, S., et al.: Domain agnostic online semantic segmentation for multi-dimensional time series. In: Data Mining and Knowledge Discovery (2018)
Google Scholar
Heldens, S., et al.: Rocket: efficient and scalable all-pairs computations on heterogeneous platforms. In: Proceedings of SC 2020. IEEE Press (2020)
Google Scholar
Jirkovský, V., et al.: Big data analysis for sensor time-series in automation. In: IEEE Emerging Technology and Factory Automation (ETFA), pp. 1–8 (2014)
Google Scholar
Jones, P.W., et al.: Randomized approximate nearest neighbors algorithm. Proc. Natl. Acad. Sci. 108(38), 15679–15686 (2011)
Article Google Scholar
Ju, Y., et al.: Exploiting reduced precision for GPU-based Time series mining. In: IEEE IPDPS, pp. 124–134 (2022)
Google Scholar
Karlstetter, R., et al.: Turning dynamic sensor measurements from gas turbines into insights: a big data approach. In: Turbo Expo, vol. 6 (2019)
Google Scholar
Karlstetter, R., et al.: Living on the edge: efficient handling of large scale sensor data. In: 2021 IEEE/ACM CCGrid 2021, pp. 1–10 (2021)
Google Scholar
Linardi, M., et al.: Matrix profile X: VALMOD - scalable discovery of variable-length motifs in data series. In: ACM SIGMOD, p. 1053–1066 (2018)
Google Scholar
Lu, Y., et al.: Matrix profile XXIV: scaling time series anomaly detection to trillions of datapoints and ultra-fast arriving data streams. In: ACM SIGKDD (2022)
Google Scholar
Mercer, R., et al.: Matrix profile XXIII: contrast profile: a novel time series primitive that allows real world classification. In: IEEE ICDM (2021)
Google Scholar
Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)
Article Google Scholar
Netti, A.: HPC-ODA dataset collection (2020). https://doi.org/10.5281/zenodo.3701440
Patwary, M.M.A., et al.: PANDA: extreme scale parallel k-nearest neighbor on distributed architectures. CoRR abs/1607.08220 (2016)
Google Scholar
Pfeilschifter, G.: time series analysis with matrix profile on HPC systems. Master thesis, Technische Universität München (2019)
Google Scholar
Raksha, S., et al.: Weather forecasting framework for time series data using intelligent learning models. In: 5th ICEECCOT 2021, pp. 783–787 (2021)
Google Scholar
Rakthanmanon, T., et al.: Searching and mining trillions of time series subsequences under dynamic time warping. In: ACM SIGKDD, pp. 262–270 (2012)
Google Scholar
Ram, P., Sinha, K.: Revisiting KD-tree for nearest neighbor search. In: KDD 2019, pp. 1378–1388. Association for Computing Machinery, New York (2019)
Google Scholar
Raoofy, A., Karlstetter, R., Yang, D., Trinitis, C., Schulz, M.: Time series mining at petascale performance. In: Sadayappan, P., Chamberlain, B.L., Juckeland, G., Ltaief, H. (eds.) ISC High Performance 2020. LNCS, vol. 12151, pp. 104–123. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50743-5_6
Chapter Google Scholar
Rokhlin, V.: Rapid solution of integral equations of classical potential theory. J. Comput. Phys. 60(2), 187–207 (1985)
Article MathSciNet MATH Google Scholar
Schall-Zimmerman, Z., et al.: Matrix profile XVIII: time series mining in the face of fast moving streams using a learned approximate matrix profile. In: IEEE ICDM, pp. 936–945 (2019)
Google Scholar
Schmidl, S., et al.: Anomaly detection in time series: a comprehensive evaluation. Proc. VLDB Endow. 15(9), 1779–1797 (2022)
Article Google Scholar
Shakibay Senobari, et al.: Using the similarity matrix profile to investigate foreshock behavior of the 2004 parkfield earthquake. In: AGU Fall Meeting Abstracts, vol. 2018, pp. S51B–03 (2018)
Google Scholar
Steinbusch, B., et al.: A massively parallel barnes-hut tree code with dual tree traversal. In: PARCO (2015)
Google Scholar
Thill, M., et al.: MarkusThill/MGAB: The Mackey-glass anomaly benchmark (2020). https://doi.org/10.5281/zenodo.3760086
Van Der Maaten, L.: Accelerating T-SNE using tree-based algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)
MathSciNet MATH Google Scholar
Xiao, B., Biros, G.: Parallel algorithms for nearest neighbor search problems in high dimensions. SIAM J. Sci. Comput. 38(5), S667–S699 (2016)
Article MathSciNet MATH Google Scholar
Yeh, C.M., et al.: Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: IEEE ICDM, pp. 1317–1322 (2016)
Google Scholar
Yeh, C.M., et al.: Matrix profile III: the matrix profile allows visualization of salient subsequences in massive time series. In: IEEE ICDM, pp. 579–588 (2016)
Google Scholar
Yu, C.D., et al.: Performance optimization for the K-nearest neighbors kernel on X86 architectures. In: ACM SC (2015)
Google Scholar
Zheng, X., et al.: PSML: a multi-scale time-series dataset for machine learning in decarbonized energy grids (dataset) (2021). https://doi.org/10.5281/zenodo.5130612
Zhu, Y., et al.: Matrix profile XI: SCRIMP++: time series motif discovery at interactive speeds. In: IEEE ICDM, pp. 837–846 (2018)
Google Scholar
Zhu, Y., et al.: Matrix profile VII: time series chains: a new primitive for time series data mining. In: 2017 IEEE ICDM 2017, pp. 695–704 (2017)
Google Scholar
Zhu, Y., et al.: Matrix profile II: exploiting a novel algorithm and GPUs to break the one hundred million barrier for time series motifs and joins. Knowl. Inf. Syst. 54(1) (2018)
Google Scholar
Zhu, Y., et al.: The swiss army knife of time series data mining: ten useful things you can do with the matrix profile and ten lines of code. In: KDD 2020, vol. 34, pp. 949–979 (2020)
Google Scholar
Zimmerman, Z., et al.: Matrix profile XIV: scaling time series motif discovery with GPUs to break a quintillion pairwise comparisons a day and beyond. In: ACM SoCC, pp. 74–86 (2019)
Google Scholar

Download references

Acknowledgements

This work is partially funded by Bayerische Forschungsstiftung under the research grants Optimierung von Gasturbinen mit Hilfe von Big Data (AZ-1214-16), and Von der Edge zur Cloud und zurück: Skalierbare und Adaptive Sensordatenverarbeitung (AZ-1468-20). The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this work by providing computing time on GCS Supercomputer SuperMUC-NG at at Leibniz Supercomputing Centre (www.lrz.de).

Author information

Authors and Affiliations

Leibniz Supercomputing Centre, Munich, Germany
Amir Raoofy
IfTA GmbH, Puchheim, Germany
Roman Karlstetter
Université Grenoble Alpes (UGA), Grenoble, France
Martin Schreiber
Technical University of Munich, Munich, Germany
Amir Raoofy, Roman Karlstetter, Martin Schreiber, Carsten Trinitis & Martin Schulz

Authors

Amir Raoofy
View author publications
You can also search for this author in PubMed Google Scholar
Roman Karlstetter
View author publications
You can also search for this author in PubMed Google Scholar
Martin Schreiber
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Trinitis
View author publications
You can also search for this author in PubMed Google Scholar
Martin Schulz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amir Raoofy .

Editor information

Editors and Affiliations

University of Maryland, College Park, MD, USA
Abhinav Bhatele
NVIDIA, Helsinki, Finland
Jeff Hammond
Université Paris-Saclay, Gif-sur-Yvette, France
Marc Baboulin
CERFACS, Toulouse, France
Carola Kruse

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raoofy, A., Karlstetter, R., Schreiber, M., Trinitis, C., Schulz, M. (2023). Overcoming Weak Scaling Challenges in Tree-Based Nearest Neighbor Time Series Mining. In: Bhatele, A., Hammond, J., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13948. Springer, Cham. https://doi.org/10.1007/978-3-031-32041-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-32041-5_17
Published: 10 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32040-8
Online ISBN: 978-3-031-32041-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Overcoming Weak Scaling Challenges in Tree-Based Nearest Neighbor Time Series Mining

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

BestNeighbor: efficient evaluation of kNN queries on large time series databases

Time Series Mining at Petascale Performance

PARROT: pattern-based correlation exploitation in big partitioned data series

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Overcoming Weak Scaling Challenges in Tree-Based Nearest Neighbor Time Series Mining

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

BestNeighbor: efficient evaluation of kNN queries on large time series databases

Time Series Mining at Petascale Performance

PARROT: pattern-based correlation exploitation in big partitioned data series

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation