MaxyLee

🎯

Focusing

Maxy MaxyLee

🎯

Focusing

To be or not to be

15 followers · 12 following

University of Macau
Macau

Achievements

Stars

adobe-research / vaw_dataset

This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild" and the ECCV 2022 paper titled "Improving Closed and…

Python 66 6 Updated Jul 22, 2022

henghuiding / gRefCOCO

A benchmark dataset for GRES and GREC [CVPR2023 Highlight]

Python 233 4 Updated Sep 4, 2023

TheEighthDay / SeekWorld

The first attempt to replicate o3-like visual clue-tracking reasoning capabilities.

Python 40 3 Updated Apr 20, 2025

BytedTsinghua-SIA / DAPO

An Open-source RL System from ByteDance Seed and Tsinghua AIR

Python 1,245 51 Updated May 11, 2025

thunlp / DeepPerception

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Python 55 Updated Mar 27, 2025

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 144,582 29,011 Updated May 20, 2025

thunlp / Migician

[ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models

Python 54 3 Updated May 20, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 8,234 984 Updated May 20, 2025

Deep-Agent / R1-V

Witness the aha moment of VLM with less than $3.

Python 3,672 284 Updated May 19, 2025

nerfies / nerfies.github.io

JavaScript 3,218 1,260 Updated Jun 21, 2024

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 7,647 648 Updated May 20, 2025

edchengg / oven_eval

ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities

Python 40 2 Updated Sep 3, 2024

rom1504 / img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python 4,037 355 Updated Aug 7, 2024

Charles-Xie / awesome-described-object-detection

A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull request…

279 21 Updated Apr 9, 2025

THUDM / CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 2,349 154 Updated Mar 3, 2025

open-compass / VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 2,383 365 Updated May 20, 2025

HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

JavaScript 22,141 2,745 Updated May 20, 2025

THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 6,552 428 Updated May 29, 2024

PaddlePaddle / FastDeploy

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end…

C++ 3,184 477 Updated Feb 24, 2025

rcmalli / keras-vggface

VGGFace implementation with Keras Framework

Python 948 420 Updated Jul 9, 2024

timesler / facenet-pytorch

Pretrained Pytorch face detection (MTCNN) and facial recognition (InceptionResnet) models

Python 4,868 987 Updated Aug 2, 2024

ruochunjin / C-MS-Celeb

A clean version (wash list) of MS-Celeb-1M face dataset, containing 6,464,018 face images of 94,682 celebrities

346 93 Updated Oct 9, 2020

Giphy / celeb-detection-oss

GIPHY's Open-Source Celebrity Detection Deep Learning Model

Python 691 67 Updated Sep 28, 2023

cvdfoundation / google-landmark

Dataset with 5 million images depicting human-made and natural landmarks spanning 200 thousand classes.

Shell 793 131 Updated Sep 18, 2023

dvlab-research / LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python 2,209 153 Updated Feb 16, 2025

mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 878 48 Updated Nov 23, 2024

ali-vilab / VGen

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models

Python 3,103 274 Updated Jan 10, 2025

MaxyLee / 3AM

Official code and data of "3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset"

12 1 Updated Dec 8, 2024

lucidrains / imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

Python 8,264 783 Updated Oct 7, 2024

amazon-science / mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

Python 3,923 324 Updated Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maxy MaxyLee

Achievements

Achievements

Block or report MaxyLee

Stars

adobe-research / vaw_dataset

henghuiding / gRefCOCO

TheEighthDay / SeekWorld

BytedTsinghua-SIA / DAPO

thunlp / DeepPerception

huggingface / transformers

thunlp / Migician

volcengine / verl

Deep-Agent / R1-V

nerfies / nerfies.github.io

modelscope / ms-swift

edchengg / oven_eval

rom1504 / img2dataset

Charles-Xie / awesome-described-object-detection

THUDM / CogVLM2

open-compass / VLMEvalKit

HumanSignal / label-studio

THUDM / CogVLM

PaddlePaddle / FastDeploy

rcmalli / keras-vggface

timesler / facenet-pytorch

ruochunjin / C-MS-Celeb

Giphy / celeb-detection-oss

cvdfoundation / google-landmark

dvlab-research / LISA

mbzuai-oryx / groundingLMM

ali-vilab / VGen

MaxyLee / 3AM

lucidrains / imagen-pytorch

amazon-science / mm-cot