8000 Need help for reproducing linear eval results with published weights. · Issue #149 · facebookresearch/dino · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Need help for reproducing linear eval results with published weights. #149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lucasb-eyer opened this issue Oct 26, 2021 · 5 comments
Closed

Comments

@lucasb-eyer
Copy link

Hi Mathilde,

I'm porting your models to JAX for further research. I have replicated it so that I get exactly the same outputs (logits, and all intermediates too) when passing the same random input to PyTorch and JAX on the same machine, loading the weights you published for ViT-B/16+linear. (I had to get some details right, like GELU being different between the frameworks.) However, when running full evaluation on ImageNet val using the JAX model, I'm getting 77.73% (resize=bilinear) or 77.548% (resize=bicubic), when I think I should get 78.162 according to these eval logs.

Thus, AFAICT, the only possible remaining causes for difference would be (1) before the model, i.e. preprocessing of images, or (2) after the model, i.e. computation of score, or (3) something not matching on your side.

For (1), I read the code and believe I'm doing the same, including setting resize mode to bicubic and antialias=False. However, it is TF resize, so that may make a difference.
image

(2) is trivial, I'm sure I got it right.

(3) I don't have ImageNet for PyTorch at hand and it would take me a while to get it.

Could you please double-check the number with your published weights and code for me? And do you have any more ideas, or would you ascribe it to different framework's resize functions at this point?

@woctezuma
Copy link
woctezuma commented Oct 26, 2021

Not sure if you have seen that blog post about TF resize. Let me look for it. Edit: Linked below.

@lucasb-eyer
Copy link
Author
lucasb-eyer commented Oct 26, 2021

yes I know which one you mean. However, it is about old tf1 being a complete mess. tf2 supposedly fixed this, and that's what I'm using.

@lucasb-eyer
Copy link
Author

oh, shoot! Two differences in resize:

  • in Torchvision, Resize(256) means "small side 256, keep aspect ratio". In my code, Resize(256) is a shortcut for Resize((256, 256)), and I just assumed it to be that way in PyTorch. Ugh.
  • in Torchvision, depending where in the pipeline you put ToTensor, it invokes either PIL or Torch's resize. In the order used here, it will use PIL, which actually always uses antialias=True, no matter what option was set.

Will check with those two changes.

@lucasb-eyer
Copy link
Author

Confirming that with these fixes, both frameworks produce almost the same images, and I do get 78.02% with that model, it's still 0.1% too low, but that's a diff that I can believe comes from slightly different resize. The initial diff of 0.5 was way too large for that.

This is now close enough that I don't feel I'll mis-represent the DINO model and can continue with a clear conscience.

Thanks for being my 🦆

image

@woctezuma
Copy link

You are welcome! Good job! 🦆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0