Model architecture consultation · Issue #20 · aehrc/cxrmate · GitHub

8000 Model architecture consultation · Issue #20 · aehrc/cxrmate · GitHub

More Web Proxy on the site http://driver.im/

Model architecture consultation #20

Open

Open

Model architecture consultation#20

hi,
I would like to ask why the visual encoder was chosen as CvT and why the decoder in this paper uses a 6-layer Transformer. What is the basis for these choices? Did you refer to other works or conduct any comparative experiments?

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

0