Stars
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
A framework for few-shot evaluation of language models.
NVIDIA Linux open GPU with P2P support
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Official repository for "AM-RADIO: Reduce All Domains Into One"
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
Refine high-quality datasets and visual AI models
[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Tracking and collecting papers/projects/others related to Segment Anything.
ICLR‘24 Offical Implementation of Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization
[ECCV 2022] This repo is official PyTorch implementation of 3D Clothed Human Reconstruction in the Wild.
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
A performance library for machine learning applications.
[ICCV 2023] Official implementation of the paper "Less is More: Focus Attention for Efficient DETR"
This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
Development repository for the Triton language and compiler
Hiera: A fast, powerful, and simple hierarchical vision transformer.
Hackable and optimized Transformers building blocks, supporting a composable construction.
Fast and memory-efficient exact attention
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.