Time-R1 introduces the study of slow-thinking reasoning for time series forecasting. We propose a two-stage reinforcement fine-tuning framework combining supervised warmup and policy optimization with GRIP, a group-based sampling strategy for multi-step reasoning. Our model significantly improves forecasting accuracy across diverse datasets, demonstrating the effectiveness of training LLMs for structured temporal reasoning.
This repository contains the official code for our paper:
Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs > Yucong Luo, Yitong Zhou, Mingyue Cheng, Jiahao Wang, Daoyu Wang, Jintao Zhang, Tingyue Pan
🚩 News (Jun. 2025): The final version of the paper was polished and successfully submitted to arXiv.
🚩 News (May 2025): The Time-R1 repository was officially released and fully open-sourced on GitHub.
Large Language Models (LLMs) demonstrate impressive capabilities but often lack time series reasoning for forecasting tasks. Ours addresses this by introducing a novel two-stage reinforcement fine-tuning (RFT) curriculum, guided by a custom-designed multi-objective reward framework that explicitly shapes temporal reasoning. Our approach progressively develops:
- (Stage 1: SFT for Warmup Adaption) Foundational skills through supervised fine-tuning, where LLMs learn temporal analysis using synthetic CoT data, ensuring proper structure and formatting.
- (Stage 2: RL for Exploring Effective Reasoning Patterns) Advanced forecasting via RL, with rewards based on ground truth alignment, multi-horizon accuracy, and domain principles. GRIP (Group-based Relative Importance for Policy Optimization) enhances reasoning paths through non-uniform sampling and adaptive weighting.
Experiments show that Time-R1 significantly improves forecasting accuracy and generalization across multiple real-world datasets.
- Training Dataset:
- Preparing training and evaluating datasets.
- Time-R1 Model Checkpoints:
- The final model after two-stage RFT.
- Source Code:
- For training Time-R1 and evaluating.
- Slow-Thinking Time Series Reasoning: Trains LLMs to perform deliberate, step-by-step temporal analysis for forecasting tasks.
- Two-Stage RFT Framework: Combines warm-up supervised fine-tuning (SFT) with reinforcement learning (RL) for progressive capability building.
- GRIP: Group-based Reward Optimization: Introduces non-uniform sampling and adaptive weighting to enhance reasoning path exploration and model robustness.
- Fine-Grained Multi-Objective Rewards: Designed to improve temporal coherence, multi-horizon accuracy, and alignment with domain-specific forecasting principles.
- Strong Forecasting Performance: Extensive experiments on real-world datasets demonstrate significant improvements over baseline methods through the slow-thinking paradigm.
We recommend using Python 3.10+ and setting up a clean environment via conda
, with system pre-requisites including CUDA ≥ 12.4 and cuDNN ≥ 9.8.0 before training or inference.
- CUDA: Version ≥ 12.4
- cuDNN: Version ≥ 9.8.0
conda create -n time-r1 python==3.10
conda activate time-r1
git clone https://github.com/lqzxt/Time-R1.git
# Install verl framework
cd Time-R1
pip install --no-deps -e .
pip install -r requirements.txt
# Run training
bash scripts/time-r1.sh
cd Time-R1/eval
python main.py
- 🧠 Verl: A flexible and efficient RLHF framework used for reinforcement learning with GRIP.
- 🦙 LLaMA Factory: Streamlined interface for supervised fine-tuning and RLHF on open LLMs.
- 🔢 Qwen2.5 Models: Open-source LLMs that serve as our forecasting backbone.
- 🔧 vLLM: High-throughput LLM inference engine used during RL rollout.