10000 MaxyLee (Maxy) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View MaxyLee's full-sized avatar
🎯
Focusing
🎯
Focusing
  • University of Macau
  • Macau

Block or report MaxyLee

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild" and the ECCV 2022 paper titled "Improving Closed and…

Python 66 6 Updated Jul 22, 2022

A benchmark dataset for GRES and GREC [CVPR2023 Highlight]

Python 233 4 Updated Sep 4, 2023

The first attempt to replicate o3-like visual clue-tracking reasoning capabilities.

Python 40 3 Updated Apr 20, 2025

An Open-source RL System from ByteDance Seed and Tsinghua AIR

Python 1,245 51 Updated May 11, 2025

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Python 55 Updated Mar 27, 2025

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 144,582 29,011 Updated May 20, 2025

[ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models

Python 54 3 Updated May 20, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 8,234 984 Updated May 20, 2025

Witness the aha moment of VLM with less than $3.

Python 3,672 284 Updated May 19, 2025
JavaScript 3,218 1,260 Updated Jun 21, 2024

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 7,647 648 Updated May 20, 2025

ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities

Python 40 2 Updated Sep 3, 2024

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python 4,037 355 Updated Aug 7, 2024

A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull request…

279 21 Updated Apr 9, 2025

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 2,349 154 Updated Mar 3, 2025

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 2,383 365 Updated May 20, 2025

Label Studio is a multi-type data labeling and annotation tool with standardized output format

JavaScript 22,141 2,745 Updated May 20, 2025

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 6,552 428 Updated May 29, 2024

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end…

C++ 3,184 477 Updated Feb 24, 2025

VGGFace implementation with Keras Framework

Python 948 420 Updated Jul 9, 2024

Pretrained Pytorch face detection (MTCNN) and facial recognition (InceptionResnet) models

Python 4,868 987 Updated Aug 2, 2024

A clean version (wash list) of MS-Celeb-1M face dataset, containing 6,464,018 face images of 94,682 celebrities

346 93 Updated Oct 9, 2020

GIPHY's Open-Source Celebrity Detection Deep Learning Model

Python 691 67 Updated Sep 28, 2023

Dataset with 5 million images depicting human-made and natural landmarks spanning 200 thousand classes.

Shell 793 131 Updated Sep 18, 2023

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python 2,209 153 Updated Feb 16, 2025

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 878 48 Updated Nov 23, 2024

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models

Python 3,103 274 Updated Jan 10, 2025

Official code and data of "3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset"

12 1 Updated Dec 8, 2024

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

Python 8,264 783 Updated Oct 7, 2024

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

Python 3,923 324 Updated Jun 12, 2024
Next
0