[ENH] de-novo implementation of `LTSFTransformer` based on `cure-lab` research code base #6202

geetu040 · 2024-03-24T14:50:21Z

Reference Issues/PRs

Implements LTSFTransformer from #4939

What does this implement/fix? Explain your changes.

New forecaster LTSFTransformer

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

Class names and API layout

Did you add any tests for the change?

Not yet

PR checklist

For all contributions

Optionally, for added estimators: I've added myself and possibly to the maintaners tag - do this if you want to become the owner or maintainer of an estimator you added.
See here for further details on the algorithm maintainer role.
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

For new estimators

I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.

8000

fkiraly · 2024-03-24T15:24:28Z

Nice! Quick question, is this lifting the code from somewhere (in which case we need togive credit in docstrings etc), or is it a de-novo implementation?

geetu040 · 2024-03-24T16:30:34Z

Nice! Quick question, is this lifting the code from somewhere (in which case we need togive credit in docstrings etc), or is it a de-novo implementation?

Originally copied then edited for sktime compatibility.
I'll work on docstrings and credit original author, soon as the blocks are connected and interface is ready.

geetu040 · 2024-06-10T05:49:55Z

I need to implement the forward pass that aligns with sktime interface and transformer architechure. @fkiraly @benHeid your input here would be really appreciated.

        def forward(self, x):
            """Forward pass for LSTF-Transformer Network.

            Parameters
            ----------
            x : torch.Tensor
                torch.Tensor of shape [Batch, Input Sequence Length, Channel]

            Returns
            -------
            x : torch.Tensor
                output of Linear Model. x.shape = [Batch, Output Length, Channel]
            """
            from torch import ones

            batch_size = x.size(0)
            seq_len = self.seq_len
            pred_len = self.pred_len
            num_features = x.size(2)
            num_X_features = 5

            x_enc = x
            x_mark_enc = ones(batch_size, seq_len, num_X_features)
            x_dec = ones(batch_size, pred_len, num_features)
            x_mark_dec = ones(batch_size, pred_len, num_X_features)

            return self._forward(x_enc, x_mark_enc, x_dec, x_mark_dec)

        def _forward(
            self,
            x_enc,
            x_mark_enc,
            x_dec,
            x_mark_dec,
            enc_self_mask=None,
            dec_self_mask=None,
            dec_enc_mask=None,
        ):
            enc_out = self.enc_embedding(x_enc, x_mark_enc)
            enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)

            dec_out = self.dec_embedding(x_dec, x_mark_dec)
            dec_out = self.decoder(
                dec_out, enc_out, x_mask=dec_self_mask, cross_mask=dec_enc_mask
            )

            if self.output_attention:
                return dec_out[:, -self.pred_len :, :], attns
            else:
                return dec_out[:, -self.pred_len :, :]  # [B, L, D]

Code above shows the _forward method used by the transformer and in forward I simplified the input to just make the pipeline running and fix other architecture issues.
These are the important parameters required in the forward pass of transformer

x_enc: Input data for the encoder. Can be the historical y in sktime context
x_mark_enc: Time embeddings (or positional embeddings) for the encoder input data. Can be the historical X in sktime context. what happens when there is no X?
x_dec: Input data for the decoder. Can be the y_pred in context of sktime where it can be fh when predicting and y_pred while training - this would require to change the pytorch adapter.
x_mark_dec: Time embeddings (or positional embeddings) for the decoder input data. can be X_pred but again what happens when there is no exogenous data?

geetu040 · 2024-06-10T15:45:45Z

Summarizing above ...

Problem Statement

sktime interface provides X and y for training and prediction. Transformer architecture consisting of encoder and decoder takes 4 inputs

x_enc: input sequence for the encoder
x_mark_enc: time embeddings (or positional embeddings) of input sequence for encoder
x_dec: target sequence for the decoder
x_mark_dec: time embeddings (or positional embeddings) of target sequence for decoder

Now we have to make changes in the pytorch adapter or the forward pass that translates the X and y to these 4 inputs understandable by the transformer

Proposed Design

We break the X and y in train into segments of X_enc, X_dec and y_enc, y_dec respectively by a specific ratio
then use
- y_enc as x_enc - input sequence for the encoder
- y_dec as x_dec - target sequence for the decoder
- X_enc as x_mark_enc - time embeddings (or positional embeddings) of input sequence for encoder
- X_dec as x_mark_dec - time embeddings (or positional embeddings) of target sequence for decoder

These changes are needed to be made on the BaseDeepNetworkPyTorch

Requested Feedback

@benHeid
Q1: Please review the above design. I would need your acknowledgment before implementing in code
Q2: How would this strategy be used for prediction? We have there X_pred and y_pred which can be feeded as encoder inputs but how do we choose the decoder inputs in transformer?

benHeid · 2024-06-12T20:32:07Z

I would propose to take a look into their experiment file exp/exp_main.py and in their dataset implementation.
And I would try to follow their approach to create the dataset as closely as possible. Differences would be that we wouldn't use a read instead we would provide directly the data.

In the dataset classes the separation in the y and x is done. Afterwards, the masking for prediction etc is done in the main_exp file.

So Regarding Q1: I would propose to use exactly their approach but I assume that your is quite similar. Regarding Q2, y would be a concatenation from the historical values and zeros if I understood their implementation correctly.

Please ping me if you would like clarification at some point.

…ip ci]

geetu040 · 2024-06-14T16:32:12Z

@benHeid I think we cannot adopt the PredDataset class from cure-lab as it uses the available test data for validation. In our case, we are having real-time data instead of already available prediction data.
Please let me know if I am right about this - and the proposed method would be to create a Dataset class of our own that uses X and y for encoder from some of the historical values seen during fit and X (provided in predict) and y (zeroes) for decoder.

geetu040 · 2024-06-18T15:02:58Z

@fkiraly this is how the docstring looks like at the moment
see in file


    Parameters
    ----------
    seq_len : int
        Length of the input sequence.
    pred_len : int
        Length of the prediction sequence.
    label_len : int, optional (default=2)
        Length of the label sequence.
    num_epochs : int, optional (default=16)
        Number of epochs for training.
    batch_size : int, optional (default=8)
        Size of the batch.
    in_channels : int, optional (default=1)
        Number of input channels.
    individual : bool, optional (default=False)
        Whether to use individual models for each series.
    criterion : str or callable, optional
        Loss function to use.
    criterion_kwargs : dict, optional
        Additional keyword arguments for the loss function.
    optimizer : str or callable, optional
        Optimizer to use.
    optimizer_kwargs : dict, optional
        Additional keyword arguments for the optimizer.
    lr : float, optional (default=0.001)
        Learning rate.
    custom_dataset_train : torch.utils.data.Dataset, optional
        Custom dataset for training.
    custom_dataset_pred : torch.utils.data.Dataset, optional
        Custom dataset for prediction.
    output_attention : bool, optional (default=False)
        Whether to output attention weights.
    embed_type : int, optional (default=0)
        Type of embedding to use.
    embed : str, optional (default="fixed")
        Type of embedding.
    enc_in : int, optional (default=7)
        Number of encoder input features.
    dec_in : int, optional (default=7)
        Number of decoder input features.
    d_model : int, optional (default=512)
        Dimension of the model.
    n_heads : int, optional (default=8)
        Number of attention heads.
    d_ff : int, optional (default=2048)
        Dimension of the feed-forward network.
    e_layers : int, optional (default=3)
        Number of encoder layers.
    d_layers : int, optional (default=2)
        Number of decoder layers.
    factor : int, optional (default=5)
        Factor for attention.
    dropout : float, optional (default=0.1)
        Dropout rate.
    activation : str, optional (default="relu")
        Activation function.
    c_out : int, optional (default=7)
        Number of output features.
    freq : str, optional (default="h")
        Frequency of the data.
    Examples
    --------
    >>> from sktime.forecasting.ltsf import LTSFTransfomer, LTSFLinearForecaster
    >>> from sktime.datasets import load_longley
    >>>
    >>> batch_size = 5
    >>> seq_len = 5
    >>> label_len = 2
    >>> pred_len = 3
    >>> num_features = 1
    >>>
    >>> y, X = load_longley()
    >>> split_point = len(y) - pred_len
    >>> X_train, X_test = X[:split_point], X[split_point:]
    >>> y_train, y_test = y[:split_point], y[split_point:]
    >>>
    >>> model = LTSFTransfomer(
    ... 	seq_len = seq_len,
    ... 	pred_len = pred_len,
    ... 	label_len = label_len,
    ... 	output_attention = False,
    ... 	embed_type = 0,
    ... 	embed = "fiixed",
    ... 	enc_in = num_features,
    ... 	dec_in = num_features,
    ... 	d_model = 512,
    ... 	n_heads = 8,
    ... 	d_ff = 2048,
    ... 	e_layers = 1,
    ... 	d_layers = 1,
    ... 	factor = 5,
    ... 	dropout = 0.1,
    ... 	activation = "relu",
    ... 	c_out = pred_len,
    ... 	freq = 'h',
    ... 	num_epochs=1,
    ... 	batch_size=batch_size,
    >>> )
    >>>
    >>> model.fit(y_train, X_train, fh=[1, 2, 3])
    >>> pred = model.predict(X=X_test)

geetu040 · 2024-06-18T15:05:10Z

@benHeid I have tried to train and predict using the current implementation (no changes in the architecture) and the predictions don't seem to move at all. Should I inquire more into this by trying other loss and optimizer methods or keeping working on the code and check this later?

geetu040 · 2024-06-19T18:07:32Z

I belive the dataset class is complete and ready to review. I have made some changes in the original dataset class that was provided by the cure-lab. These are only syntax changes, no change in core logic, although I have reduced some code that was not useful in the interface.

Changes made

seq_len, label_len, pred_len are taken as parameter rather than a list of size - as it makes a better interface
I am keeping the complete data instead of splitting that into train-test-val to make it compatible with the existing pytorch adapter
removed scaling option - as it should be done at the adapter level
removed the code that concatentes X features with the Y as it is against the existing pytorch adapter implementation and if this step is needed, it is to be adopted at the adapter level

Todos

scale data at adapter level
current model will not work if the provided data index is not of valid format i.e DatetimeIndex. this needs to be looked
there is a lot of code that seems hard coded and highly reliant on pandas frequency - that needs to be changed before it breaks
predictions are still not improving with the epochs, that also needs a deeper study of the implementation or configuring the hyper parameters

By the way this is how the dataloader looks like

 ========== Data ========== 
[[102.]
 [108.]
 [122.]
 [119.]
 [111.]
 [125.]
 [138.]
 [138.]
 [126.]
 [109.]
 [ 94.]
 [108.]
 [105.]
 [116.]
 [131.]
 [125.]
 [115.]
 [139.]
 [160.]
 [160.]]
 ========== Dataloader: x ========== 
{'x_enc': tensor([[102.],
        [108.],
        [122.],
        [119.],
        [111.]]), 'x_mark_enc': tensor([[ 1., 31.,  0.,  0.],
        [ 2., 28.,  0.,  0.],
        [ 3., 31.,  3.,  0.],
        [ 4., 30.,  5.,  0.],
        [ 5., 31.,  1.,  0.]]), 'x_dec': tensor([[119.],
        [111.],
        [  0.],
        [  0.],
        [  0.]]), 'x_mark_dec': tensor([[ 4., 30.,  5.,  0.],
        [ 5., 31.,  1.,  0.],
        [ 6., 30.,  3.,  0.],
        [ 7., 31.,  6.,  0.],
        [ 8., 31.,  2.,  0.]])}
 ========== Dataloader: y ========== 
tensor([[125.],
        [138.],
        [138.]])

geetu040 · 2024-06-21T14:33:02Z

Following changes are made to reformat init params

label_len renamed to context_len, as this name gives more information about the parameter
output_attention removed from parameter, as the user would not need to see the output attention generated during the process and it would break the pytorch_adapter
enc_in, dec_in, c_out are removed from parameter and a new param num_features is added temporarily, which will later be removed as well. num_features should be infered from the data and enc_in, dec_in, c_out should be equal to that

benHeid · 2024-07-01T20:43:46Z

Personally, I would just force-ignore temporal_encoding if the index is not temporal.

I agree

fkiraly

Can you please resolve merge conflicts and ensure tests run, @geetu040?

…o ltsf-transformer

fkiraly

Looks good!

One question, I see the test test_predict_time_index_in_sample_full is skipped - why is that skipped?

geetu040 · 2024-07-17T19:46:21Z

One question, I see the test test_predict_time_index_in_sample_full is skipped - why is that skipped?

Because LTSFTransformer forecasts the next values and doesnot work on fh with negative values. and this test case checks for negative values in fh if I am right

fkiraly · 2024-07-17T22:05:52Z

What you describe is that the forecaster cannot make insample forecasts - this should be addressed by setting the insample tag correctly, and then the test should work:
https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.registry._tags.capability__insample.html#sktime.registry._tags.capability__insample

geetu040 · 2024-07-18T07:56:25Z

geetu040 commented

Jul 18, 2024

"capability:insample": False has been set in the parent class BaseDeepNetworkPyTorch of LTSFTransformer
I skipped test_predict_time_index_in_sample_full for LTSFTransformer as it was skipped for other LTSF algorithms like LTSFNLinearForecaster and I thought it was for insample predictions. If this test case is not checking for the insample predictions then what does it check?

fkiraly · 2024-07-18T10:13:48Z

it will check that the tag is correctly set, I believe

geetu040 · 2024-07-24T11:26:50Z

it will check that the tag is correctly set, I believe

yes, it checks via the tag - updated the tests config

fkiraly

Well, looks like we did not have to skip those tests after all!

Would appreciate a review from @benHeid before merging due to familiarity with the algorithm, this is a complex PR.

geetu040 and others added 2 commits March 24, 2024 19:38

implements ltsf-transformer layers and network

04855db

[AUTOMATED] update CONTRIBUTORS.md

0b69c61

fkiraly added module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting enhancement Adding new functionality labels Mar 24, 2024

fkiraly assigned geetu040 Jun 4, 2024

geetu040 added 5 commits June 7, 2024 18:35

connect transformer blocks

5ebe676

remove ltsf_transformer

6a209f0

Merge branch 'main' into ltsf-transformer

b748ad1

fix quality failures

3a94a93

update forward pass

9cf30e8

update and debug code for running forward pass [skip ci]

252b368

geetu040 added 2 commits June 13, 2024 19:14

add cure-lab dataset methods [skip ci]

405c96c

make LTSF Dataset class, sktime compatible - working forward pass [sk…

d8c32d2

…ip ci]

working _predict and add temp docstring. [skip ci]

600d189

geetu040 force-pushed the ltsf-transformer branch from 1bb9b95 to 600d189 Compare June 18, 2024 15:00

clean dataset with working fit and predict. [no ci]

7d94c8a

geetu040 mentioned this pull request Jun 21, 2024

[ENH] pytorch interface points and suggested refactors #6641

Open

11 tasks

geetu040 added 2 commits June 21, 2024 16:35

clean and arrange architechture layers

0893d30

rework on init params. [no ci]

9f9193d

geetu040 force-pushed the ltsf-transformer branch from fa52cda to 9f9193d Compare June 21, 2024 14:35

geetu040 added 3 commits July 1, 2024 22:16

move network hyper params from config to init params

8b317c8

improve warning message

0df7261

add test_param

9add4d8

geetu040 marked this pull request as ready for review July 1, 2024 21:11

geetu040 requested review from achieveordie, benHeid and yarnabrina as code owners July 1, 2024 21:11

geetu040 and others added 2 commits July 1, 2024 23:13

Merge branch 'main' into ltsf-transformer

1832be0

[AUTOMATED] update CONTRIBUTORS.md

d603560

fkiraly changed the title ~~[ENH] Implements LTSFTransformer~~ [ENH] de-novo implementation of LTSFTransformer based on cure-lab research code base Jul 4, 2024

fkiraly requested changes Jul 15, 2024

View reviewed changes

geetu040 added 3 commits July 15, 2024 20:01

Merge branch 'main' into ltsf-transformer

794b79f

apply pre-commit changes

385a4a5

Merge remote-tracking branch 'refs/remotes/fork/ltsf-transformer' int…

546aa84

…o ltsf-transformer

fkiraly requested changes Jul 17, 2024

View reviewed changes

geetu040 added 2 commits July 24, 2024 11:28

Merge branch 'main' into ltsf-transformer

26ed073

remove test skips for ltsf forecasters; as they are catered with tags

705f882

geetu040 requested a review from fkiraly July 24, 2024 11:27

fkiraly previously approved these changes Jul 24, 2024

View reviewed changes

geetu040 added 2 commits August 4, 2024 21:19

Merge branch 'main' into ltsf-transformer; resolve conflicts

9b23a73

Merge branch 'main' into ltsf-transformer

f7a87c1

geetu040 dismissed fkiraly’s stale review via f7a87c1 August 5, 2024 19:03

fkiraly approved these changes Aug 6, 2024

View reviewed changes

fkiraly merged commit 6397e87 into sktime:main Aug 6, 2024
68 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ENH] de-novo implementation of `LTSFTransformer` based on `cure-lab` research code base #6202

[ENH] de-novo implementation of `LTSFTransformer` based on `cure-lab` research code base #6202

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[ENH] de-novo implementation of LTSFTransformer based on cure-lab research code base #6202

[ENH] de-novo implementation of LTSFTransformer based on cure-lab research code base #6202

Uh oh!

Conversation

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

PR checklist

For all contributions

For new estimators

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Problem Statement

Proposed Design

Requested Feedback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[ENH] de-novo implementation of `LTSFTransformer` based on `cure-lab` research code base #6202

[ENH] de-novo implementation of `LTSFTransformer` based on `cure-lab` research code base #6202