More Web Proxy on the site http://driver.im/

research-article

A Survey on Time-Series Pre-Trained Models

Authors:

Zhenjing Zheng,

James T. KwokAuthors Info & Claims

IEEE Transactions on Knowledge and Data Engineering, Volume 36, Issue 12

Pages 7536 - 7555

https://doi.org/10.1109/TKDE.2024.3475809

Published: 01 December 2024 Publication History

Abstract

Time-Series Mining (TSM) is an important research area since it shows great potential in practical applications. Deep learning models that rely on massive labeled data have been utilized for TSM successfully. However, constructing a large-scale well-labeled dataset is difficult due to data annotation costs. Recently, pre-trained models have gradually attracted attention in the time series domain due to their remarkable performance in computer vision and natural language processing. In this survey, we provide a comprehensive review of Time-Series Pre-Trained Models (TS-PTMs), aiming to guide the understanding, applying, and studying TS-PTMs. Specifically, we first briefly introduce the typical deep learning models employed in TSM. Then, we give an overview of TS-PTMs according to the pre-training techniques. The main categories we explore include supervised, unsupervised, and self-supervised TS-PTMs. Further, extensive experiments involving 27 methods, 434 datasets, and 679 transfer learning scenarios are conducted to analyze the advantages and disadvantages of transfer learning strategies, Transformer-based models, and representative TS-PTMs. Finally, we point out some potential directions of TS-PTMs for future work.

References

[1]

Y. Wu, J. M. Hernández-Lobato, and G. Zoubin, “Dynamic covariance models for multivariate financial time series,” in Proc. Int. Conf. Mach. Learn., PMLR, 2013, pp. 558–566.

[2]

N. Moritz, T. Hori, and J. Le, “Streaming automatic speech recognition with the transformer model,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2020, pp. 6074–6078.

[3]

J. Martinez, M. J. Black, and J. Romero, “On human motion prediction using recurrent neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2891–2900.

[4]

S. Wang, J. Cao, and P. S. Yu, “Deep learning for spatio-temporal data mining: A survey,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 8, pp. 3681–3700, Aug. 2022.

[5]

D. A. Tedjopurnomo, Z. Bao, B. Zheng, F. Choudhury, and A. K. Qin, “A survey on modern deep neural network for traffic prediction: Trends, methods and challenges,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 4, pp. 1544–1561, Apr. 2022.

[6]

T.-C. Fu, “A review on time series data mining,” Eng. Appl. Artif. Intell., vol. 24, no. 1, pp. 164–181, 2011.

Digital Library

[7]

G. Li, B. K. K. Choi, J. Xu, S. S. Bhowmick, K.-P. Chun, and G. L. Wong, “Efficient shapelet discovery for time series classification,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 3, pp. 1149–1163, Mar. 2022.

[8]

Z. Che, S. Purushotham, K. Cho, D. Sontag, and Y. Liu, “Recurrent neural networks for multivariate time series with missing values,” Sci. Rep., vol. 8, no. 1, pp. 1–12, 2018.

[9]

M. H. Tahan, M. Ghasemzadeh, and S. Asadi, “Development of fully convolutional neural networks based on discretization in time series classification,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 7, pp. 6827–6838, Jul. 2023.

Digital Library

[10]

R. Sen, H.-F. Yu, and I. S. Dhillon, “Think globally, act locally: A deep neural network approach to high-dimensional time series forecasting,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 1–10.

[11]

H. Zhou et al., “Informer: Beyond efficient transformer for long sequence time-series forecasting,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 11106–11115.

[12]

Q. Wen et al., “Time series data augmentation for deep learning: A survey,” in Proc. 30th Int. Joint Conf. Artif. Intell., 2021, pp. 4653–4660.

[13]

B. K. Iwana and S. Uchida, “An empirical survey of data augmentation for time series classification with neural networks,” PLoS One, vol. 16, no. 7, 2021, Art. no.

[14]

J. E. Van Engelen and H. H. Hoos, “A survey on semi-supervised learning,” Mach. Learn., vol. 109, no. 2, pp. 373–440, 2020.

[15]

Z. Liu, Q. Ma, P. Ma, and L. Wang, “Temporal-frequency co-training for time series semi-supervised learning,” in Proc. AAAI Conf. Artif. Intell., 2023, pp. 8923–8931.

[16]

C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” J. Big Data, vol. 6, no. 1, pp. 1–48, 2019.

[17]

L. Yang, S. Hong, and L. Zhang, “Spectral propagation graph network for few-shot time series classification,” 2022, arXiv.

[18]

S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.

Digital Library

[19]

F. Zhuang et al., “A comprehensive survey on transfer learning,” Proc. IEEE, vol. 109, no. 1, pp. 43–76, Jan. 2021.

[20]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.

[21]

K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” 2021, arXiv.

[22]

X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang, “Pre-trained models for natural language processing: A survey,” Sci. China Technol. Sci., vol. 63, pp. 1872–1897, 2020.

[23]

H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller, “Transfer learning for time series classification,” in Proc. 2018 IEEE Int. Conf. Big Data, 2018, pp. 1367–1376.

[24]

C.-H. H. Yang, Y.-Y. Tsai, and P.-Y. Chen, “Voice2Series: Reprogramming acoustic models for time series classification,” in Proc. 38th Int. Conf. Mach. Learn., PMLR, 2021, pp. 11808–11819.

[25]

P. Malhotra, V. Tv, L. Vig, P. Agarwal, and G. Shroff, “TimeNet: Pre-trained deep recurrent neural network for time series classification,” 2017, arXiv.

[26]

G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff, “A transformer-based framework for multivariate time series representation learning,” in Proc. 27th ACM SIGKDD Conf. Knowl. Discov. Data Mining, 2021, pp. 2114–2124.

[27]

S. Deldari, H. Xue, A. Saeed, J. He, D. V. Smith, and F. D. Salim, “Beyond just vision: A review on self-supervised representation learning on multimodal and temporal data,” 2022, arXiv.

[28]

X. Zhang, Z. Zhao, T. Tsiligkaridis, and M. Zitnik, “Self-supervised contrastive pre-training for time series via time-frequency consistency,” in Proc. Adv. Neural Inf. Process. Syst., 2022, pp. 1–16.

[29]

W. Zhang, L. Yang, S. Geng, and S. Hong, “Cross reconstruction transformer for self-supervised time series representation learning,” 2022, arXiv.

[30]

R. Ye and Q. Dai, “A novel transfer learning framework for time series forecasting,” Knowl.-Based Syst., vol. 156, pp. 74–99, 2018.

[31]

K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9729–9738.

[32]

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in Proc. Int. Conf. Mach. Learn., PMLR, 2020, pp. 1597–1607.

[33]

H. Zhang, J. Wang, Q. Xiao, J. Deng, and Y. Lin, “SleepPriorCL: Contrastive representation learning with prior knowledge-based positive mining and adaptive temperature for sleep staging,” 2021, arXiv.

[34]

N. Laptev, J. Yu, and R. Rajagopal, “Reconstruction and regression loss for time-series transfer learning,” in Proc. Special Int. Group Knowl. Discov. Data Mining 4th Workshop Mining Learn. Time Ser., 2018, pp. 1–8.

[35]

E. Eldele, M. Ragab, Z. Chen, M. Wu, C.-K. Kwoh, and X. Li, “Label-efficient time series representation learning: A review,” IEEE Trans. Artif. Intell., early access, Jul. 17, 2024.

[36]

K. Zhang et al., “Self-supervised learning for time series analysis: Taxonomy, progress, and prospects,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 10, pp. 6775–6794, Oct. 2024.

Digital Library

[37]

Q. Meng, H. Qian, Y. Liu, Y. Xu, Z. Shen, and L. Cui, “Unsupervised representation learning for time series: A review,” 2023, arXiv.

[38]

M. Jin et al., “Large models for time series and spatio-temporal data: A survey and outlook,” 2023, arXiv.

[39]

M. Jin et al., “Position paper: What can large language models tell us about time series analysis,” in Proc. 41st Int. Conf. Mach. Learn., 2024, pp. 1–17.

[40]

Y. Liang et al., “Foundation models for time series analysis: A tutorial and survey,” in Proc. 30th ACM SIGKDD Conf. Knowl. Discov. Data Mining, 2024, pp. 6555–6565.

[41]

H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller, “Deep learning for time series classification: A review,” Data Mining Knowl. Discov., vol. 33, no. 4, pp. 917–963, 2019.

Digital Library

[42]

B. Lim and S. Zohren, “Time-series forecasting with deep learning: A survey,” Philos. Trans. Roy. Soc. A, vol. 379, no. 2194, 2021, Art. no.

[43]

B. Lafabregue, J. Weber, P. Gançarski, and G. Forestier, “End-to-end deep representation learning for time series clustering: A comparative study,” Data Mining Knowl. Discov., vol. 36, pp. 29–81, 2022.

Digital Library

[44]

F. Liu et al., “Anomaly detection in quasi-periodic time series based on automatic data segmentation and attentional LSTM-CNN,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 6, pp. 2626–2640, Jun. 2022.

[45]

A. Blázquez-García, A. Conde, U. Mori, and J. A. Lozano, “A review on outlier/anomaly detection in time series data,” ACM Comput. Surv., vol. 54, no. 3, pp. 1–33, 2021.

Digital Library

[46]

W. Cao, D. Wang, J. Li, H. Zhou, L. Li, and Y. Li, “BRITS: Bidirectional recurrent imputation for time series,” 2018, arXiv.

[47]

M. C. De Souto, I. G. Costa, D. S. De Araujo, T. B. Ludermir, and A. Schliep, “Clustering cancer gene expression data: A comparative study,” BMC Bioinf., vol. 9, no. 1, pp. 1–14, 2008.

[48]

J. de Jong et al., “Deep learning for clustering of multivariate clinical patient trajectories with missing values,” GigaScience, vol. 8, no. 11, 2019, Art. no.

[49]

C. W. Tan, C. Bergmeir, F. Petitjean, and G. I. Webb, “Time series extrinsic regression: Predicting numeric values from time series data,” Data Mining Knowl. Discov., vol. 35, no. 3, pp. 1032–1060, 2021.

[50]

Q. Xu, K. Wu, M. Wu, K. Mao, X. Li, and Z. Chen, “Reinforced knowledge distillation for time series regression,” IEEE Trans. Artif. Intell., vol. 5, no. 6, pp. 3184–3194, Jun. 2024.

[51]

B. Li et al., “DifFormer: Multi-resolutional differencing transformer with dynamic ranging for time series analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 11, pp. 13586–13598, Nov. 2023.

Digital Library

[52]

C. W. Tan, C. Bergmeir, F. Petitjean, and G. I. Webb, “Monash University, UEA, UCR time series extrinsic regression archive,” 2020, arXiv.

[53]

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014, arXiv.

[54]

K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, “LSTM: A search space odyssey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 10, pp. 2222–2232, Oct. 2017.

[55]

N. Muralidhar, S. Muthiah, and N. Ramakrishnan, “DyAt nets: Dynamic attention networks for state forecasting in cyber-physical systems,” in Proc. Int. Joint Conf. Artif. Intell., 2019, pp. 3180–3186.

[56]

Q. Ma, J. Zheng, S. Li, and G. W. Cottrell, “Learning representations for time series clustering,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 3781–3791.

[57]

J. Gu et al., “Recent advances in convolutional neural networks,” Pattern Recognit., vol. 77, pp. 354–377, 2018.

Digital Library

[58]

M. Liu et al., “SCINet: Time series modeling and forecasting with sample convolution and interaction,” in Proc. Adv. Neural Inf. Process. Syst., 2022, pp. 5816–5828.

[59]

Z. Cui, W. Chen, and Y. Chen, “Multi-scale convolutional neural networks for time series classification,” 2016, arXiv.

[60]

K. Kashiparekh, J. Narwariya, P. Malhotra, L. Vig, and G. Shroff, “ConvTimeNet: A pre-trained deep convolutional neural network for time series classification,” in Proc. 2019 Int. Joint Conf. Neural Netw., 2019, pp. 1–8.

[61]

S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” 2018, arXiv.

[62]

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3431–3440.

[63]

Y. Chen, Y. Kang, Y. Chen, and Z. Wang, “Probabilistic forecasting with temporal convolutional neural network,” Neurocomputing, vol. 399, pp. 491–501, 2020.

[64]

J. Xu et al., “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” in Proc. Adv. Neural Inf. Process. Syst., 2021, pp. 1–12.

[65]

T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting,” 2022, arXiv.

[66]

Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 1, pp. 4–24, Jan. 2021.

[67]

Z. Cheng et al., “Time2Graph: Bridging time series and graph representation learning via multiple attentions,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 2, pp. 2078–2090, Feb. 2023.

[68]

Y. Wang et al., “Graph-aware contrasting for multivariate time-series classification,” in Proc. AAAI Conf. Artif. Intell., 2024, pp. 15725–15734.

[69]

D. Zha, K.-H. Lai, K. Zhou, and X. Hu, “Towards similarity-aware time-series classification,” in Proc. 2022 SIAM Int. Conf. Data Mining, SIAM, 2022, pp. 199–207.

[70]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.

[71]

J. Serrà, S. Pascual, and A. Karatzoglou, “Towards a universal neural network encoder for time series,” in Proc. Artif. Intell. Res. Develop., 2018, pp. 120–129.

[72]

Y. Chen et al., “The UCR time series classification archive,” Jul. 2015. [Online]. Available: www.cs.ucr.edu/eamonn/time_series_data/

[73]

Z. Wang, W. Yan, and T. Oates, “Time series classification from scratch with deep neural networks: A strong baseline,” in Proc. 2017 Int. Joint Conf. Neural Netw., 2017, pp. 1578–1585.

[74]

H. Ismail Fawaz et al., “InceptionTime: Finding alexnet for time series classification,” Data Mining Knowl. Discov., vol. 34, no. 6, pp. 1936–1962, 2020.

Digital Library

[75]

W. Tang, G. Long, L. Liu, T. Zhou, M. Blumenstein, and J. Jiang, “Omni-scale CNNs: A simple and effective kernel size configuration for time series classification,” in Proc. Int. Conf. Learn. Representations, 2022, pp. 1–17.

[76]

F. Li, K. Shirahama, M. A. Nisar, X. Huang, and M. Grzegorzek, “Deep transfer learning for time series data based on sensor modality classification,” Sensors, vol. 20, no. 15, pp. 4271–4296, 2020.

[77]

Z. Yue et al., “TS2Vec: Towards universal representation of time series,” in Proc. Advance. Artif. Intell. Conf. Artif. Intell., 2022, pp. 8980–8987.

[78]

L. Ye and E. Keogh, “Time series shapelets: A new primitive for data mining,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2009, pp. 947–956.

[79]

A. Meiseles and L. Rokach, “Source model selection for deep learning in the time series domain,” IEEE Access, vol. 8, pp. 6190–6200, 2020.

[80]

R. Mutegeki and D. S. Han, “Feature-representation transfer learning for human activity recognition,” in Proc. 2019 Int. Conf. Inf. Commun. Technol. Convergence, 2019, pp. 18–20.

[81]

R. Cai et al., “Time series domain adaptation via sparse associative structure alignment,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 6859–6867.

[82]

G. Wilson, J. R. Doppa, and D. J. Cook, “Multi-source deep domain adaptation with weak supervision for time-series sensor data,” in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2020, pp. 1768–1778.

[83]

G. Csurka, “Domain adaptation for visual applications: A comprehensive survey,” 2017, arXiv.

[84]

C. Chen et al., “HoMM: Higher-order moment matching for unsupervised domain adaptation,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 3422–3429.

[85]

S. Xie, Z. Zheng, L. Chen, and C. Chen, “Learning semantic representations for unsupervised domain adaptation,” in Proc. Int. Conf. Mach. Learn., PMLR, 2018, pp. 5423–5432.

[86]

H. Yan, Y. Ding, P. Li, Q. Wang, Y. Xu, and W. Zuo, “Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2272–2281.

[87]

K. You, M. Long, Z. Cao, J. Wang, and M. I. Jordan, “Universal domain adaptation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2720–2729.

[88]

J. Wang, Y. Chen, L. Hu, X. Peng, and S. Y. Philip, “Stratified transfer learning for cross-domain activity recognition,” in Proc. 2018 IEEE Int. Conf. Pervasive Comput. Commun., 2018, pp. 1–10.

[89]

Z. Li, R. Cai, T. Z. Fu, and K. Zhang, “Transferable time-series forecasting under causal conditional shift,” 2021, arXiv.

[90]

Q. Liu and H. Xue, “Adversarial spectral kernel matching for unsupervised time series domain adaptation,” in Proc. Int. Joint Conf. Artif. Intell., 2021, pp. 2744–2750.

[91]

M. A. A. H. Khan, N. Roy, and A. Misra, “Scaling human activity recognition via deep learning-based domain adaptation,” in Proc. 2018 IEEE Int. Conf. Pervasive Comput. Commun., 2018, pp. 1–9.

[92]

A. Tank, I. Covert, N. Foti, A. Shojaie, and E. B. Fox, “Neural granger causality,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 8, pp. 4267–4279, Aug. 2022.

Digital Library

[93]

M. Ragab, E. Eldele, Z. Chen, M. Wu, C.-K. Kwoh, and X. Li, “Self-supervised autoregressive domain adaptation for time series data,” 2021, arXiv.

[94]

P. R. d. O. da Costa, A. Akçay, Y. Zhang, and U. Kaymak, “Remaining useful lifetime prediction via deep domain adaptation,” Rel. Eng. Syst. Saf., vol. 195, 2020, Art. no.

[95]

W. Garrett, R. D. Janardhan, and J. C. Diane, “CALDA: Improving multi-source time series domain adaptation with contrastive adversarial learning,” 2021, arXiv.

[96]

Z. Li et al., “Causal mechanism transfer network for time series domain adaptation in mechanical systems,” ACM Trans. Intell. Syst. Technol., vol. 12, no. 2, pp. 1–21, 2021.

Digital Library

[97]

G. F. Elsayed, I. Goodfellow, and J. Sohl-Dickstein, “Adversarial reprogramming of neural networks,” in Proc. Int. Conf. Learn. Representations, 2019, pp. 1–14.

[98]

C.-H. H. Yang et al., “Decentralizing feature extraction with quantum convolutional neural network for automatic speech recognition,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2021, pp. 6523–6527.

[99]

T. Zhou et al., “One fits all: Power general time series analysis by pretrained LM,” in Proc. Adv. Neural Inf. Process. Syst., 2023, pp. 43322–43355.

[100]

D. Cao et al., “TEMPO: Prompt-based generative pre-trained transformer for time series forecasting,” in Proc. 12th Int. Conf. Learn. Representations, 2024, pp. 1–33.

[101]

M. Jin et al., “Time-LLM: Time series forecasting by reprogramming large language models,” in Proc. 12th Int. Conf. Learn. Representations, 2024, pp. 1–24.

[102]

X. Jiang, R. Missel, Z. Li, and L. Wang, “Sequential latent variable models for few-shot high-dimensional time-series forecasting,” in Proc. 11th Int. Conf. Learn. Representations, 2023, pp. 1–21.

[103]

Y. Liu, H. Zhang, C. Li, X. Huang, J. Wang, and M. Long, “Timer: Generative pre-trained transformers are large time series models,” in Proc. 41st Int. Conf. Mach. Learn., 2024, pp. 1–31.

[104]

G. Woo, C. Liu, A. Kumar, C. Xiong, S. Savarese, and D. Sahoo, “Unified training of universal time series forecasting transformers,” in Proc. 41st Int. Conf. Mach. Learn., 2024, pp. 1–25.

[105]

A. Das, W. Kong, R. Sen, and Y. Zhou, “A decoder-only foundation model for time-series forecasting,” in Proc. 41st Int. Conf. Mach. Learn., 2024, pp. 1–21.

[106]

T. Mallick, P. Balaprakash, E. Rask, and J. Macfarlane, “Transfer learning with graph neural networks for short-term highway traffic forecasting,” in Proc. 25th Int. Conf. Pattern Recognit., 2021, pp. 10367–10374.

[107]

Y. Du et al., “AdaRNN: Adaptive learning and forecasting of time series,” in Proc. 30th ACM Int. Conf. Inf. Knowl. Manage., 2021, pp. 402–411.

[108]

A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” 2018, arXiv.

[109]

S. Schneider, A. Baevski, R. Collobert, and M. Auli, “wav2vec: Unsupervised pre-training for speech recognition,” in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2019, pp. 3465–3469.

[110]

E. Eldele et al., “Time-series representation learning via temporal and contextual contrasting,” in Proc. 30th Int. Joint Conf. Artif. Intell., 2021, pp. 2352–2359.

[111]

O. Vinyals et al., “Matching networks for one shot learning,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 1–9.

[112]

C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn., PMLR, 2017, pp. 1126–1135.

[113]

B. Lu, X. Gan, W. Zhang, H. Yao, L. Fu, and X. Wang, “Spatio-temporal graph few-shot learning with cross-city knowledge transfer,” in Proc. 28th ACM SIGKDD Conf. Knowl. Discov. Data Mining, 2022, pp. 1162–1172.

[114]

R. Wang, R. Walters, and R. Yu, “Meta-learning dynamics forecasting using task inference,” in Proc. Adv. Neural Inf. Process. Syst., 2022, pp. 21640–21653.

[115]

T. Iwata and A. Kumagai, “Few-shot learning for time-series forecasting,” 2020, arXiv.

[116]

B. N. Oreshkin, D. Carpov, N. Chapados, and Y. Bengio, “Meta-learning framework with applications to zero-shot time-series forecasting,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 9242–9250.

[117]

L. Brinkmeyer, R. R. Drumond, J. Burchert, and L. Schmidt-Thieme, “Few-shot forecasting of time-series with heterogeneous channels,” 2022, arXiv.

[118]

A. Castellani, S. Schmitt, and B. Hammer, “Estimating the electrical power output of industrial devices with end-to-end time-series classification in the presence of label noise,” in Proc. Eur. Conf. Mach. Learn., 2021, pp. 1–32.

[119]

I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 1–9.

[120]

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, and L. Bottou, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” J. Mach. Learn. Res., vol. 11, no. 12, pp. 3371–3408, 2010.

Digital Library

[121]

A. Baevski, S. Schneider, and M. Auli, “vq-wav2vec: Self-supervised learning of discrete speech representations,” 2019, arXiv.

[122]

Q. Ma, C. Chen, S. Li, and G. W. Cottrell, “Learning representations for incomplete time series clustering,” in Proc. Advance. Artif. Intell. Conf. Artif. Intell., 2021, pp. 8837–8846.

[123]

Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” in Proc. 11th Int. Conf. Learn. Representations, 2023, pp. 1–24.

[124]

J. Dong, H. Wu, H. Zhang, L. Zhang, J. Wang, and M. Long, “SimMTM: A simple pre-training framework for masked time-series modeling,” in Proc. Adv. Neural Inf. Process. Syst., 2023, pp. 29996–30025.

[125]

K. Zhang, C. Li, and Q. Yang, “TriD-MAE: A generic pre-trained model for multivariate time series with missing values,” in Proc. 32nd ACM Int. Conf. Inf. Knowl. Manage., 2023, pp. 3164–3173.

[126]

Y.-A. Chung, C.-C. Wu, C.-H. Shen, H.-Y. Lee, and L.-S. Lee, “Audio Word2Vec: Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder,” 2016, arXiv.

[127]

Q. Hu, R. Zhang, and Y. Zhou, “Transfer learning for short-term wind speed prediction with deep neural networks,” Renewable Energy, vol. 85, pp. 83–95, 2016.

[128]

Z. Shao, Z. Zhang, F. Wang, and Y. Xu, “Pre-training enhanced spatial-temporal graph neural network for multivariate time series forecasting,” in Proc. 28th ACM SIGKDD Conf. Knowl. Discov. Data Mining, 2022, pp. 1567–1577.

[129]

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. 2019 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol., 2019, pp. 4171–4186.

[130]

J. Dong et al., “TimeSiam: A pre-training framework for siamese time-series modeling,” in Proc. 41st Int. Conf. Mach. Learn., 2024, pp. 1–25.

[131]

E. Eldele, M. Ragab, Z. Chen, M. Wu, and X. Li, “TSLANet: Rethinking transformers for time series representation learning,” in Proc. 41st Int. Conf. Mach. Learn., 2024, pp. 1–20.

[132]

Y. Zhang, M. Liu, S. Zhou, and J. Yan, “UP2ME: Univariate pre-training to multivariate fine-tuning as a general-purpose framework for multivariate time series analysis,” in Proc. 41st Int. Conf. Mach. Learn., 2024, pp. 1–24.

[133]

M. Goswami, K. Szafer, A. Choudhry, Y. Cai, S. Li, and A. Dubrawski, “Moment: A family of open time-series foundation models,” in Proc. 41st Int. Conf. Mach. Learn., 2024, pp. 1–38.

[134]

R. R. Chowdhury, X. Zhang, J. Shang, R. K. Gupta, and D. Hong, “TARNet: Task-aware reconstruction for time-series transformer,” in Proc. 28th ACM SIGKDD Conf. Knowl. Discov. Data Mining, Washington, DC, USA, 2022, pp. 14–18.

[135]

A. T. Liu, S.-W. Li, and H.-Y. Lee, “TERA: Self-supervised learning of transformer encoder representation for speech,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 29, pp. 2351–2366, 2021.

Digital Library

[136]

K. S. Kalyan, A. Rajasekharan, and S. Sangeetha, “AMMUS: A survey of transformer-based pretrained models in natural language processing,” 2021, arXiv.

[137]

A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 1–11.

[138]

P. Shi, W. Ye, and Z. Qin, “Self-supervised pre-training for time series classification,” in Proc. 2021 Int. Joint Conf. Neural Netw., 2021, pp. 1–8.

[139]

L. Hou, Y. Geng, L. Han, H. Yang, K. Zheng, and X. Wang, “Masked token enabled pre-training: A task-agnostic approach for understanding complex traffic flow,” IEEE Trans. Mobile Comput., early access, Apr. 18, 2024.

Digital Library

[140]

L. Zhao, M. Gao, and Z. Wang, “ST-GSP: Spatial-temporal global semantic representation learning for urban flow prediction,” in Proc. 15th ACM Int. Conf. Web Search Data Mining, 2022, pp. 1443–1451.

[141]

I. Padhi et al., “Tabular transformers for modeling multivariate time series,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2021, pp. 3565–3569.

[142]

S. M. Shankaranarayana and D. Runje, “Attention augmented convolutional transformer for tabular time-series,” in Proc. 2021 Int. Conf. Data Mining Workshops, 2021, pp. 537–541.

[143]

A. T. Liu, S.-W. Yang, P.-H. Chi, P.-C. Hsu, and H.-Y. Lee, “Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2020, pp. 6419–6423.

[144]

X. Wang, R. Zhang, C. Shen, T. Kong, and L. Li, “Dense contrastive learning for self-supervised visual pre-training,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 3024–3033.

[145]

Y. Yang, C. Zhang, T. Zhou, Q. Wen, and L. Sun, “DCdetector: Dual attention contrastive representation learning for time series anomaly detection,” in Proc. 29th ACM SIGKDD Conf. Knowl. Discov. Data Mining, 2023, pp. 3033–3045.

[146]

M. Hu et al., “Self-supervised pre-training for robust and generic spatial-temporal representations,” in Proc. 2023 IEEE Int. Conf. Data Mining, 2023, pp. 150–159.

[147]

J. Liu and S. Chen, “TimesURL: Self-supervised contrastive learning for universal time series representation learning,” in Proc. AAAI Conf. Artif. Intell., 2024, pp. 13918–13926.

[148]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 1–9.

[149]

J.-Y. Franceschi, A. Dieuleveut, and M. Jaggi, “Unsupervised scalable representation learning for multivariate time series,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 1–12.

[150]

H. Fan, F. Zhang, and Y. Gao, “Self-supervised time series representation learning by inter-intra relational reasoning,” 2020, arXiv.

[151]

S. Tonekaboni, D. Eytan, and A. Goldenberg, “Unsupervised representation learning for time series with temporal neighborhood coding,” in Proc. Int. Conf. Learn. Representations, 2021, pp. 1–17.

[152]

G. Woo, C. Liu, D. Sahoo, A. Kumar, and S. Hoi, “CoST: Contrastive learning of disentangled seasonal-trend representations for time series forecasting,” in Proc. Int. Conf. Learn. Representations, 2022, pp. 1–18.

[153]

S. Deldari, D. V. Smith, H. Xue, and F. D. Salim, “Time series change point detection with self-supervised contrastive predictive coding,” in Proc. Web Conf., 2021, pp. 3124–3135.

[154]

D. Luo et al., “Information-aware time series meta-contrastive learning,” in Proc. Int. Conf. Learn. Representations, 2022, pp. 1–23.

[155]

A. Hyvarinen and H. Morioka, “Unsupervised feature extraction by time-contrastive learning and nonlinear ICA,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 1–9.

[156]

H. M. Aapo Hyvarinen, “Nonlinear ICA of temporally dependent stationary sources,” in Proc. Int. Conf. Artif. Intell. Statist., PMLR, 2017, pp. 460–469.

[157]

L. Yang, S. Hong, and L. Zhang, “Unsupervised time-series representation learning with iterative bilinear temporal-spectral fusion,” 2022, arXiv.

[158]

Q. Wen et al., “Time series data augmentation for deep learning: A survey,” 2020, arXiv.

[159]

M. T. Nonnenmacher, L. Oldenburg, I. Steinwart, and D. Reeb, “Utilizing expert features for contrastive learning of time-series representations,” in Proc. Int. Conf. Mach. Learn., PMLR, 2022, pp. 16969–16989.

[160]

Y. Asano, C. Rupprecht, and A. Vedaldi, “Self-labelling via simultaneous clustering and representation learning,” in Proc. Int. Conf. Learn. Representations, 2020, pp. 1–22.

[161]

S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” in Proc. Int. Conf. Learn. Representations, 2018, pp. 1–17.

[162]

Y.-A. Chung and J. Glass, “Speech2Vec: A sequence-to-sequence framework for learning word embeddings from speech,” 2018, arXiv.

[163]

H. A. Dau et al., “The UCR time series archive,” IEEE/CAA J. Automatica Sinica, vol. 6, no. 6, pp. 1293–1305, Nov. 2019.

[164]

A. Bagnall et al., “The UEA multivariate time series classification archive 2018,” 2018, arXiv.

[165]

H. Wu, T. Hu, Y. Liu, H. Zhou, J. Wang, and M. Long, “TimesNet: Temporal 2D-variation modeling for general time series analysis,” in Proc. 11th Int. Conf. Learn. Representations, 2023, pp. 1–23.

[166]

Y. B. Nikolay Laptev and S. Amizadeh, “A benchmark dataset for time series anomaly detection,” 2015. [Online]. Available: https://yahooresearch.tumblr.com/post/114590420346/a-benchmark-dataset-for-time-series-anomaly

[167]

H. Ren et al., “Time-series anomaly detection service at Microsoft,” in Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2019, pp. 3009–3017.

[168]

R. Wu and E. J. Keogh, “Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 3, pp. 2421–2429, Mar. 2023.

[169]

A. Huet, J. M. Navarro, and D. Rossi, “Local evaluation of time series anomaly detection algorithms,” in Proc. 28th ACM SIGKDD Conf. Knowl. Discov. Data Mining, 2022, pp. 635–645.

[170]

J. Paparrizos, P. Boniol, T. Palpanas, R. S. Tsay, A. Elmore, and M. J. Franklin, “Volume under the surface: A new accuracy evaluation measure for time-series anomaly detection,” Proc. VLDB Endowment, vol. 15, no. 11, pp. 2774–2787, 2022.

[171]

S. Li et al., “Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 1–11.

[172]

Y. Liu et al., “iTransformer: Inverted transformers are effective for time series forecasting,” in Proc. 12th Int. Conf. Learn. Representations, 2024, pp. 1–25.

[173]

A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are transformers effective for time series forecasting?,” in Proc. AAAI Conf. Artif. Intell., 2023, pp. 11121–11128.

[174]

S. Kim, K. Choi, H.-S. Choi, B. Lee, and S. Yoon, “Towards a rigorous evaluation of time-series anomaly detection,” in Proc. AAAI Conf. Artif. Intell., 2022, pp. 7194–7201.

[175]

A. Siffer, P.-A. Fouque, A. Termier, and C. Largouet, “Anomaly detection in streams with extreme value theory,” in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2017, pp. 1067–1075.

[176]

D. Park, Y. Hoshi, and C. C. Kemp, “A multimodal anomaly detector for robot-assisted feeding using an LSTM-based variational autoencoder,” IEEE Robot. Automat. Lett., vol. 3, no. 3, pp. 1544–1551, Jul. 2018.

[177]

H. Xu et al., “Unsupervised anomaly detection via variational auto-encoder for seasonal KPIs in web applications,” in Proc. 2018 World Wide Web Conf., 2018, pp. 187–196.

[178]

J. Xu, H. Wu, J. Wang, and M. Long, “Anomaly transformer: Time series anomaly detection with association discrepancy,” in Proc. Int. Conf. Learn. Representations, 2022, pp. 1–20.

[179]

Y. Chen, K. Ren, Y. Wang, Y. Fang, W. Sun, and D. Li, “ContiFormer: Continuous-time transformer for irregular time series modeling,” in Proc. 37th Int. Conf. Neural Inf. Process. Syst., 2023, pp. 47143–47175.

[180]

D. Luo and X. Wang, “ModernTCN: A modern pure convolution structure for general time series analysis,” in Proc. 12th Int. Conf. Learn. Representations, 2024, pp. 1–43.

[181]

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” 2023, arXiv.

[182]

L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual representation learning with bidirectional state space model,” in Proc. 41st Int. Conf. Mach. Learn., 2024, pp. 1–11.

[183]

B. N. Patro and V. S. Agneeswaran, “SiMBA: Simplified mamba-based architecture for vision and multivariate time series,” 2024, arXiv.

[184]

Z. Wang et al., “Is mamba effective for time series forecasting?,” 2024, arXiv.

[185]

T. Wu, X. Wang, S. Qiao, X. Xian, Y. Liu, and L. Zhang, “Small perturbations are enough: Adversarial attacks on time series prediction,” Inf. Sci., vol. 587, pp. 794–812, 2022.

Digital Library

[186]

L. Liu, Y. Park, T. N. Hoang, H. Hasson, and L. Huan, “Robust multivariate time-series forecasting: Adversarial attacks and defense mechanisms,” in Proc. 11th Int. Conf. Learn. Representations, 2023, pp. 1–18.

[187]

F. Karim, S. Majumdar, and H. Darabi, “Adversarial attacks on time series,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 10, pp. 3309–3320, Oct. 2021.

[188]

D. Hendrycks, M. Mazeika, S. Kadavath, and D. Song, “Using self-supervised learning can improve model robustness and uncertainty,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 1–12.

[189]

H. Song, M. Kim, D. Park, Y. Shin, and J.-G. Lee, “Learning from noisy labels with deep neural networks: A survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 11, pp. 8135–8153, Nov. 2023.

[190]

B. Han et al., “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 1–11.

[191]

J. Li, R. Socher, and S. C. Hoi, “DivideMix: Learning with noisy labels as semi-supervised learning,” in Proc. Int. Conf. Learn. Representations, 2020, pp. 1–14.

[192]

Z. Liu, P. Ma, D. Chen, W. Pei, and Q. Ma, “Scale-teaching: Robust multi-scale training for time series classification with noisy labels,” in Proc. 37th Int. Conf. Neural Inf. Process. Syst., 2023, pp. 33726–33757.

[193]

M. Tan, M. A. Merrill, V. Gupta, T. Althoff, and T. Hartvigsen, “Are language models actually useful for time series forecasting?,” 2024, arXiv.

[194]

H. Xue and F. D. Salim, “PromptCast: A new prompt-based learning paradigm for time series forecasting,” IEEE Trans. Knowl. Data Eng., vol. 36, no. 11, pp. 6851–6864, Nov. 2024.

Digital Library

[195]

Z. Liu, W. Pei, D. Lan, and Q. Ma, “Diffusion language-shapelets for semi-supervised time-series classification,” in Proc. AAAI Conf. Artif. Intell., 2024, pp. 14079–14087.

[196]

A. Radford et al., “Learning transferable visual models from natural language supervision,” in Proc. Int. Conf. Mach. Learn., PMLR, 2021, pp. 8748–8763.

Cited By

Song ZLu QXu HZhu HBuckeridge DLi Y(2024)TimelyGPT: Extrapolatable Transformer Pre-training for Long-term Time-Series Forecasting in HealthcareProceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3698587.3701364(1-10)Online publication date: 22-Nov-2024
https://dl.acm.org/doi/10.1145/3698587.3701364

Index Terms

A Survey on Time-Series Pre-Trained Models

Index terms have been assigned to the content through auto-classification.

Recommendations

Time-Series Forecasting of Indoor Temperature Using Pre-trained Deep Neural Networks
Proceedings of the 23rd International Conference on Artificial Neural Networks and Machine Learning ICANN 2013 - Volume 8131

Artificial neural networks have proved to be good at time-series forecasting problems, being widely studied at literature. Traditionally, shallow architectures were used due to convergence problems when dealing with deep models. Recent research findings ...
Bi-tuning: Efficient Transfer from Pre-trained Models
Machine Learning and Knowledge Discovery in Databases: Research Track
Abstract
It is a de facto practice in the deep learning community to first pre-train a deep neural network from a large-scale dataset and then fine-tune the pre-trained model to a specific downstream task. Recently, both supervised and unsupervised pre-...
Combining seasonal ARIMA models with computational intelligence techniques for time series forecasting

Seasonal autoregressive integrated moving average (SARIMA) models form one of the most popular and widely used seasonal time series models over the past three decades. However, in several researches it has been argued that they have two basic ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering Volume 36, Issue 12

Dec. 2024

2224 pages

Issue’s Table of Contents

1041-4347 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 December 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Song ZLu QXu HZhu HBuckeridge DLi Y(2024)TimelyGPT: Extrapolatable Transformer Pre-training for Long-term Time-Series Forecasting in HealthcareProceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3698587.3701364(1-10)Online publication date: 22-Nov-2024
https://dl.acm.org/doi/10.1145/3698587.3701364

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents