8000 Ferenas (Ferenas) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View Ferenas's full-sized avatar
  • Shanghai Jiao Tong University

Block or report Ferenas

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Ongoing research training transformer models at scale

Python 12,401 2,781 Updated May 22, 2025
Python 1,153 36 Updated May 21, 2025

[ICML'25] "ConText: Driving In-context Learning for Text Removal and Segmentation"

1 Updated May 19, 2025

DeepFashion2 Dataset https://arxiv.org/pdf/1901.07973.pdf

Jupyter Notebook 2,425 372 Updated Jan 28, 2025

Awesome work on hand pose estimation/tracking

Python 3,210 533 Updated Mar 11, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,016 28 Updated May 21, 2025

[NeurIPS'23] Emergent Correspondence from Image Diffusion

Python 688 38 Updated May 14, 2024

Code for the Molmo Vision-Language Model

Python 426 37 Updated Dec 12, 2024

CoTracker is a model for tracking any point (pixel) on a video.

Jupyter Notebook 4,314 295 Updated Jan 21, 2025
Jupyter Notebook 209 15 Updated Apr 23, 2025

Tracking Any Point (TAP)

Jupyter Notebook 1,528 142 Updated May 19, 2025

ECCV2020 paper "Whole-Body Human Pose Estimation in the Wild"

Python 805 71 Updated Apr 22, 2025

A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body

Jupyter Notebook 7,053 1,297 Updated Jan 18, 2023

SAM-PT: Extending SAM to zero-shot video segmentation with point-based tracking.

Python 998 62 Updated Jan 27, 2024

[CVPR2024, Highlight] Official code for DragDiffusion

Python 1,215 93 Updated Jan 29, 2024

The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking(https://arxiv.org/abs/1804.06208)"

Python 2,979 603 Updated Nov 28, 2022

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"

Python 470 21 Updated Apr 8, 2024

🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning

Python 1,056 55 Updated Apr 17, 2025

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 8,152 619 Updated Apr 27, 2025
Python 29 1 Updated Jan 9, 2025

code for EMNLP 2024 paper: How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning

Jupyter Notebook 11 Updated Nov 17, 2024

[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'

Python 195 9 Updated Apr 20, 2025

This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]

Python 65 5 Updated Apr 17, 2025

[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Python 112 4 Updated May 13, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 2,991 228 Updated May 19, 2025

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

585 16 Updated May 20, 2025

Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"

Jupyter Notebook 243 9 Updated Apr 30, 2025

Official Repo for Paper "OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision" [ICLR2025]

111 3 Updated Jan 27, 2025
Python 3,843 360 Updated May 6, 2025

[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Python 350 6 Updated May 5, 2025
Next
0