8000 GitHub - shreydan/VLM-OD: experimental: finetune smolVLM on COCO (without any special <locXYZ> tokens)

More Web Proxy on the site http://driver.im/

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
examples		examples
tests		tests
.gitignore		.gitignore
README.md		README.md
eval.ipynb		eval.ipynb
eval2.ipynb		eval2.ipynb
outputs.png		outputs.png
trainer.py		trainer.py
trainer2.py		trainer2.py

Repository files navigation

SmolVLM for Object Detection

UPDATES

My understanding so far: eval2.ipynb
- I trained the model once again, this time with more epochs and all available bboxes in the image
- model is biased towards larger bboxes, doesn't bother to generate small bboxes
- class imbalance is severe, can't recognize objects of coco with presumably less samples
- the model still doesnt understand the concept of the not being present in the image, if we ask it to detect car in a photo of people, it'll generate a random bbox so needs such not present samples perhaps
- unfortunately, the VLM has lost its original capabilities -- maybe fixable if we had trained with LoRA? I am not sure.
- is terrible at detecting multiple objects in the image, even for the new model which I trained with all bboxes.

0