8000 the role of mask in attention operation · Issue #3 · Pay20Y/SAR_TF · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000
the role of mask in attention operation #3
Open
@Marcovaldong

Description

@Marcovaldong

I am reading torch implementation, your implementation and the pytorch implementation. I found that there are mask in your implementation and torch implementation, but there is no mask in pytorch implementation. Is the role of mask is to get the valid ones? If there is no mask, what will the performance and the result be like?

I am training the pytorch implementation on handwritten dataset, I found that there is a lot of repeat in the decoded result, as below shown. is is the reason that I didn't use mask in the procedure of attention operation?

groundtruth:  the^fragile^nature
prediction:  the^fragile^fragile^fragile^fragile^fragile^fragile^fragile^fragile^fragile^fragi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0