Lists (9)
Sort Name ascending (A-Z)
Stars
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
MambaGlue: Fast and Robust Local Feature Matching With Mamba @ ICRA'25
Implementation for Describe Anything: Detailed Localized Image and Video Captioning
This repository contains information for the paper "A Survey on RGB-D Datasets" and is a collaborative initiative to update the datasets list faster.
Witness the aha moment of VLM with less than $3.
[NeurIPS 2024] DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization
Produce redistributable builds of Python
CPP Implementation of "LightGlue: Local Feature Matching at Light Speed"
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
A framework to easily use 32 (and growing) different image matching methods
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Making the community's best AI chat models available to everyone.
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
This is the official release for the paper "EFM3D A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models" (https//arxiv.org/abs/2406.10224).
This repository contains the code for a virtual try-on application built using Flask, Twilio's WhatsApp API, and Gradio's virtual try-on model. Users can send images via WhatsApp to try on garments…
This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-e…
The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices
For an education purpose, from-scratch, single-file, python-only pose-graph optimization implementation
Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.