Stars
Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Official code for the CVPR 2025 paper "Navigation World Models".
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy
GenEval: An object-focused framework for evaluating text-to-image alignment
[NeurIPS 2024 Best Paper][GPT beats diffusionπ₯] [scaling laws in visual generationπ] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultβ¦
Load tensorboard event logs as pandas DataFrames for scientific plotting; Supports both PyTorch and TensorFlow
FFmpeg libav tutorial - learn how media works from basic to transmuxing, transcoding and more. Translations: πΊπΈ π¨π³ π°π· πͺπΈ π»π³ π§π·
This is an official implementation of TubeR: Tubelet Transformer for Video Action Detection
Unofficial implementation of: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics
An end-to-end PyTorch framework for image and video classification
Inflate DenseNet and ResNet as per I3D with ImageNet weight transfer
Out of time: automated lip sync in the wild
Code to reproduce the results in the FAIR research papers "Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples" https://arxiv.org/abs/β¦
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Best Practices, code samples, and documentation for Computer Vision.
A script to check for vaccine availability at a Safeway near you
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (Vβ¦
A library for efficient similarity search and clustering of dense vectors.
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
TransNet V2: Shot Boundary Detection Neural Network