Toto - Time Series Optimized Transformer for Observability

Paper | Toto Model Card | BOOM Dataset Card | Blogpost

Toto is a foundation model for multivariate time series forecasting with a focus on observability metrics. This model leverages innovative architectural designs to efficiently handle the high-dimensional, complex time series that are characteristic of observability data.

This repository also hosts the code for evaluating time series models on BOOM (Benchmark of Observability Metrics), a large-scale forecasting dataset composed of real-world observability data.

Toto model

Features

Zero-Shot Forecasting: Perform forecasting without fine-tuning on your specific time series
State-of-the-Art Performance: Achieves top scores in ben 8000 chmarks covering diverse time series forecasting tasks. This includes the established multi-domain benchmark GIFT-Eval, as well as our own observability-focused benchmark BOOM.
Multi-Variate Support: Efficiently process multiple variables using Proportional Factorized Space-Time Attention
Probabilistic Predictions: Generate both point forecasts and uncertainty estimates using a Student-T mixture model
High-Dimensional Support: Handle time series with a large number of variables efficiently
Decoder-Only Architecture: Support for variable prediction horizons and context lengths
Pre-trained on Massive Data: Trained on over 2 trillion time series data points, the largest pretraining dataset for any open-weights time series foundation model to date.

Model Weights

Toto-Open, the open-weights release of Toto, is available on Hugging Face. Currently available checkpoints:

Checkpoint	Parameters	Notes
Toto-Open-Base-1.0	151M	The initial open relase of Toto. Achieves state-of-the-art performance on both general-purpose and observability-focused benchmarking tasks, as described in our paper.

Installation

# Clone the repository
git clone https://github.com/DataDog/toto.git
cd toto

# Optional: create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

For optimal inference speed, it's recommended to install xformers and flash-attention as well.

Quick Start

Here's a simple example to get you started with forecasting:

⚠️ In our study, we take the median across 256 samples to produce a point forecast. This tutorial previously used the mean but has now been updated.

import torch
from data.util.dataset import MaskedTimeseries
from inference.forecaster import TotoForecaster
from model.toto import Toto

# Load the pre-trained model
toto = Toto.from_pretrained('Datadog/Toto-Open-Base-1.0')
toto.to('cuda')  # Move to GPU

# Optionally compile the model for faster inference
toto.compile()  # Uses Torch's JIT compilation for better performance

forecaster = TotoForecaster(toto.model)

# Prepare your input time series (channels, time_steps)
input_series = torch.randn(7, 4096).to('cuda')  # Example with 7 variables and 4096 timesteps

# Prepare timestamp information (optional, but expected by API; not used by the current model release)
timestamp_seconds = torch.zeros(7, 4096).to('cuda')
time_interval_seconds = torch.full((7,), 60*15).to('cuda')  # 15-minute intervals

# Create a MaskedTimeseries object
inputs = MaskedTimeseries(
    series=input_series,
    padding_mask=torch.full_like(input_series, True, dtype=torch.bool),
    id_mask=torch.zeros_like(input_series),
    timestamp_seconds=timestamp_seconds,
    time_interval_seconds=time_interval_seconds,
)

# Generate forecasts for the next 336 timesteps
forecast = forecaster.forecast(
    inputs,
    prediction_length=336,
    num_samples=256,  # Number of samples for probabilistic forecasting
    samples_per_batch=256,  # Control memory usage during inference
)

# Access results
median_prediction = forecast.median  # Point forecasts
prediction_samples = forecast.samples  # Probabilistic samples
lower_quantile = forecast.quantile(0.1)  # 10th percentile for lower confidence bound
upper_quantile = forecast.quantile(0.9)  # 90th percentile for upper confidence bound

Tutorials

For a comprehensive guide on using Toto for time series forecasting, check out our tutorial notebooks:

Basic Inference Tutorial: Learn how to load the model and make forecasts

Pre-Training Data

Toto was trained on a massive and diverse mixture of time series datasets:

Observability Data

The largest portion of pretraining data comes from a dataset of approximately 1 trillion time series points collected from Datadog metrics. These metrics are generated from Datadog's monitoring of internal systems, and do not include any customer data. They cover a diverse array of software stacks and types of services, and span wide variety of domains within observability, including application performance, infrastructure, networking, security, databases, and more.

Public Datasets

To improve the performance of Toto on general-purpose time series forecasting across many domains, we include publcly availa

GiftEval Pretrain
Chronos pretraining data (Note: only a subset of this dataset was used to avoid leakage with the GiftEval benchmark)

Synthetic Data

To improve robustness, approximately 1/3 of the pretraining data mix consists of synthetically-generated time series.

Evaluation

Toto has been rigorously evaluated on multiple benchmarks, including both general-purpose datasets and observability-focused datasets like BOOM. Below, we provide instructions for reproducing our evaluation results.

LSF Evaluation

To reproduce our results on the LSF datasets, follow these steps:

Downloading the Datasets

The LSF evaluation requires three datasets: ETT, Electricity, and Weather. You can download them from the Time-Series-Library repository. Follow the instructions in the repository to obtain the following already pre-processed datasets:

ETT (Electricity Transformer Temperature): Includes four subsets: ETTh1, ETTh2, ETTm1, and ETTm2.
Electricity
Weather

After downloading, ensure the datasets are placed in the data/lsf_datasets/ directory within the repository, with the following structure:

data/
└── lsf_datasets/
  ├── ETT-small/
  ├── electricity/
  └── weather/

Running the Evaluation Script

Once the datasets are set up, you can run the LSF evaluation script as follows to reproduce our results:

export CUBLAS_WORKSPACE_CONFIG=:4096:8  # For reproducible GPU results
export PYTHONPATH="$(pwd):$(pwd)/toto:$PYTHONPATH"  # Add current and "toto" dirs to Python module search path
python toto/evaluation/run_lsf_eval.py \
    --datasets ETTh1 \
    --context-length 2048 \
    --eval-stride 1 \
    --checkpoint-path [CHECKPOINT-NAME-OR-DIR]

To see all available options for the evaluation script, you can use the --help flag:

python toto/evaluation/run_lsf_eval.py --help

Expected Results

The script evaluates Toto's performance using Mean Absolute Error (MAE) and Mean Squared Error (MSE) across the specified datasets, context lengths, and prediction lengths. It displays a detailed table of results for each prediction length, along with a summary table that averages the results across prediction lengths for each dataset.

To reproduce the results presented in the paper, use the default arguments while setting --eval-stride 1 and specifying all datasets with --datasets ETTh1 ETTh2 ETTm1 ETTm2 weather electricity.

GIFT-Eval Evaluation

To reproduce our results on the GIFT-Eval benchmark, we provide a dedicated notebook:

GIFT-Eval Evaluation Notebook: Step-by-step instructions for running Toto on the GIFT-Eval benchmark and reproducing the reported results.

BOOM Evaluation

For evaluating Toto on the BOOM (Benchmark of Observability Metrics) dataset, refer to:

BOOM Evaluation Notebook: Example workflow for running Toto on the BOOM dataset.
BOOM README: Detailed instructions and scripts for benchmarking on BOOM.

These resources provide all necessary steps to run and reproduce BOOM evaluation results with Toto.

Requirements

Python 3.10+
PyTorch 2.5+
CUDA-capable device (Ampere generation or newer recommended for optimal performance)

BOOM (Benchmark of Observability Metrics)

BOOM (Benchmark of Observability Metrics) is a large-scale, real-world time series dataset designed for evaluating models on forecasting tasks in complex observability environments. Composed of real-world metrics data collected from Datadog, a leading observabili 7D46 ty platform, the benchmark captures the irregularity, structural complexity, and heavy-tailed statistics typical of production observability data. Unlike synthetic or curated benchmarks, BOOM reflects the full diversity and unpredictability of operational signals observed in distributed systems, covering infrastructure, networking, databases, security, and application-level metrics.

Note: the metrics comprising BOOM were generated from internal monitoring of pre-production environments, and do not include any customer data.

For more information on the dataset, including details on its preparation and statistical properties, see the dataset card in Hugging Face.

For example evaluations of different time series models on the BOOM dataset, see the boom folder in this repository.

Citation

If you use Toto in your research, please cite our work:

@misc{cohen2025timedifferentobservabilityperspective,
      title={This Time is Different: An Observability Perspective on Time Series Foundation Models}, 
      author={Ben Cohen and Emaad Khwaja and Youssef Doubli and Salahidine Lemaachi and Chris Lettieri and Charles Masson and Hugo Miccinilli and Elise Ramé and Qiqi Ren and Afshin Rostamizadeh and Jean Ogier du Terrail and Anna-Monica Toon and Kan Wang and Stephan Xie and Zongzhe Xu and Viktoriya Zhukova and David Asker and Ameet Talwalkar and Othmane Abou-Amal},
      year={2025},
      eprint={2505.14766},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.14766}, 
}

License

Unless explicitly stated otherwise all files in this repository are licensed under the Apache-2.0 License - see LICENSE file for details.

Contributing

We welcome contributions! Please check out our contributing guidelines to get started.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github		.github
boom		boom
toto		toto
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE-3rdparty.csv		LICENSE-3rdparty.csv
NOTICE		NOTICE
README.md		README.md
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Toto - Time Series Optimized Transformer for Observability

Table of Contents

Toto model

Features

Model Weights

Installation

Quick Start

Tutorials

Pre-Training Data

Observability Data

Public Datasets

Synthetic Data

Evaluation

LSF Evaluation

Downloading the Datasets

Running the Evaluation Script

Expected Results

GIFT-Eval Evaluation

BOOM Evaluation

Requirements

BOOM (Benchmark of Observability Metrics)

Citation

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 8

Uh oh!

Languages

License

DataDog/toto

Folders and files

Latest commit

History

Repository files navigation

Toto - Time Series Optimized Transformer for Observability

Table of Contents

Toto model

Features

Model Weights

Installation

Quick Start

Tutorials

Pre-Training Data

Observability Data

Public Datasets

Synthetic Data

Evaluation

LSF Evaluation

Downloading the Datasets

Running the Evaluation Script

Expected Results

GIFT-Eval Evaluation

BOOM Evaluation

Requirements

BOOM (Benchmark of Observability Metrics)

Citation

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 8

Uh oh!

Languages

Packages