Open
Description
I was reading another issues and walking through the code I realized that the example.py doesn't load any checkpoint, so, as far as I understood, the example.py and what you got so far from the code is just the pre trained vit, transformers, and so on, and those items were not retrained in the MMM setting proposed by the paper, right?
It's just because I have a few days to do some testing, and as far as I realized, even if I implement the img and text bi-encoder I would get the same performance as the model has, right?
Metadata
Metadata
Assignees
Labels
No labels