I test the model in an NLP task.

I use aft_full model，6 layers.
and I use it in init with this code:

self.encoder_transformer = nn.ModuleList()
for _ in range(6):
    self.encoder_transformer.append(AFTFull(max_seqlen=500, dim=512,hidden_dim=256))

and in forward function, I use this code:

for _, layer in enumerate(self.encoder_transformer):`
    x = layer(x) + x

Originally I used the traditional transformer, now I replaced it with this, the training loss appeared Nan，Is something wrong? and how U use the model for many layers，please help me, Thank U.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions