Stars
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Easily move your WSL distros VHDX file to a new location.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Code release for "Convolutional Two-Stream Network Fusion for Video Action Recognition", CVPR 2016.
Using two stream architecture to implement a classic action recognition method on UCF101 dataset
pytorch implementation of openpose including Hand and Body Pose Estimation.