Open
Description
Dear authors,
I am so impressed with your work and carefully read your CVPR2024 paper.
I am sorry if I understood your paper incorrectly. Here I am confused with the order of cross-attention. In figure 3, the highest-level features (x1) is firstly fed into cross-attention; however, in figure 4 the highest-level features (x1) is the last one fed into cross-attention.
Metadata
Metadata
Assignees
Labels
No labels