8000 linzhenyuyuchen (LinZhenYu) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View linzhenyuyuchen's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report linzhenyuyuchen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Recipes to scale inference-time compute of open models

Python 1,069 113 Updated May 8, 2025

O1 Replication Journey

1,988 65 Updated Jan 14, 2025

Efficient Triton Kernels for LLM Training

Python 5,001 318 Updated May 10, 2025

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Python 258 36 Updated Jun 24, 2024

Image augmentation for machine learning experiments.

Python 14,573 2,463 Updated Jul 30, 2024

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Python 281 8 Updated Nov 13, 2024

[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

Python 146 5 Updated Apr 30, 2024

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

678 25 Updated Apr 9, 2025

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Python 2,469 277 Updated May 7, 2025

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 8,048 607 Updated Apr 27, 2025

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,273 281 Updated May 4, 2024

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Python 805 54 Updated Mar 25, 2024
Python 19 1 Updated Oct 10, 2023

LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer

Python 376 18 Updated Apr 20, 2025

Recent LLM-based CV and related works. Welcome to comment/contribute!

862 38 Updated Mar 8, 2025

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python 1,040 70 Updated Oct 6, 2024

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

2,271 102 Updated May 4, 2025

Mora: More like Sora for Generalist Video Generation

Python 1,555 105 Updated Oct 10, 2024

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Python 9,253 1,531 Updated Aug 9, 2024

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Python 2,169 127 Updated Dec 24, 2024

OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, RTMDet-Rotated,YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc.

Python 3,194 575 Updated Jul 14, 2024

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/

Python 9,821 2,306 Updated Nov 20, 2024

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Python 5,392 517 Updated Feb 26, 2025

Referring Expression Datasets API

Jupyter Notebook 514 81 Updated Aug 27, 2024

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,756 121 Updated Apr 17, 2025

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Python 168 7 Updated Mar 29, 2025

Grok open release

Python 50,240 8,353 Updated Aug 30, 2024

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 17,242 2,019 Updated May 1, 2025

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 423 14 Updated Jan 4, 2025

[ECCV 2024] Official GitHub repository for the paper "LingoQA: Visual Question Answering for Autonomous Driving"

Python 164 6 Updated Sep 26, 2024
Next
0