Computer Science > Machine Learning

arXiv:2409.16040 (cs)

[Submitted on 24 Sep 2024 (v1), last revised 2 Oct 2024 (this version, v2)]

Title:Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Authors:Xiaoming Shi, Shiyu Wang, Yuqi Nie, Dianqi Li, Zhou Ye, Qingsong Wen, Ming Jin

Abstract:Deep learning for time series forecasting has seen significant advancements over the past decades. However, despite the success of large-scale pre-training in language and vision domains, pre-trained time series models remain limited in scale and operate at a high cost, hindering the development of larger capable forecasting models in real-world applications. In response, we introduce Time-MoE, a scalable and unified architecture designed to pre-train larger, more capable forecasting foundation models while reducing inference costs. By leveraging a sparse mixture-of-experts (MoE) design, Time-MoE enhances computational efficiency by activating only a subset of networks for each prediction, reducing computational load while maintaining high model capacity. This allows Time-MoE to scale effectively without a corresponding increase in inference costs. Time-MoE comprises a family of decoder-only transformer models that operate in an auto-regressive manner and support flexible forecasting horizons with varying input context lengths. We pre-trained these models on our newly introduced large-scale data Time-300B, which spans over 9 domains and encompassing over 300 billion time points. For the first time, we scaled a time series foundation model up to 2.4 billion parameters, achieving significantly improved forecasting precision. Our results validate the applicability of scaling laws for training tokens and model size in the context of time series forecasting. Compared to dense models with the same number of activated parameters or equivalent computation budgets, our models consistently outperform them by large margin. These advancements position Time-MoE as a state-of-the-art solution for tackling real-world time series forecasting challenges with superior capability, efficiency, and flexibility.

Comments:	30 pages, 10 figures, 13 tables
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2409.16040 [cs.LG]
	(or arXiv:2409.16040v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2409.16040

Submission history

From: Ming Jin [view email]
[v1] Tue, 24 Sep 2024 12:42:18 UTC (3,983 KB)
[v2] Wed, 2 Oct 2024 09:08:21 UTC (4,024 KB)

Computer Science > Machine Learning

Title:Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Submission history

Access Paper:

References & Citations

1 blog link

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Submission history

Access Paper:

References & Citations

1 blog link

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators