8000 coder4nlp / Starred · GitHub

More Web Proxy on the site http://driver.im/

coder4nlp

Follow

coder4nlp

Follow

0 followers · 3 following

Stars

visresearch / LLaVA-STF

The official implementation of "Learning Compact Vision Tokens for Efficient Large Multimodal Models"

Python 29 2 Updated Jun 11, 2025

Theia-4869 / FasterVLM

Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.

Python 82 4 Updated Jun 29, 2025

ddlBoJack / Awesome-Speech-Language-Model

Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.

173 13 Updated Nov 10, 2024

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,409 2,239 Updated Feb 1, 2025

illuin-tech / colpali

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

Python 1,991 177 Updated Jul 1, 2025

HiThink-Research / MME-Finance

A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning

Python 33 3 Updated Jun 3, 2025

mengcaopku / Continual-LLaVA

16 Updated Nov 12, 2024

westlake-baichuan-mllm / bc-omni

Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊

267 7 Updated Jan 27, 2025

THUDM / GLM-4-Voice

GLM-4-Voice | 端到端中英语音对话模型

Python 2,965 252 Updated Dec 5, 2024

Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 7,694 673 Updated Feb 10, 2025

microsoft / PhiCookBook

This is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi models are the most capable and cost-effective small langua…

Jupyter Notebook 3,389 431 Updated Jun 27, 2025

swordlidev / Evaluation-Multimodal-LLMs-Survey

A Survey on Benchmarks of Multimodal Large Language Models

116 9 Updated Jul 1, 2025

Zhiyuan-Li-John / MuCR

MuCR is a benchmark designed to evaluate Multimodal Large Language Models' (MLLMs) ability to discern causal links across modalities

15 2 Updated May 27, 2025

Alpha-VLLM / Lumina-mGPT

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

Python 603 29 Updated Apr 1, 2025

zhourax / VEGA

Python 37 2 Updated Jul 9, 2024

jeinlee1991 / chinese-llm-benchmark

ReLE中文大模型能力评测（持续更新）：目前已囊括257个大模型，覆盖chatgpt、gpt-4.1、o4-mini、谷歌gemini-2.5、Claude、智谱GLM-Z1、文心一言、qwen-max、百川、讯飞星火、商汤senseChat、minimax等商用模型，以及DeepSeek-R1-0528、qwq-32b、deepseek-v3、qwen3、llama4、phi-4、glm…

4,458 184 Updated Jun 23, 2025

AI4Finance-Foundation / FinRobot

FinRobot: An Open-Source AI Agent Platform for Financial Analysis using LLMs 🚀 🚀 🚀

Jupyter Notebook 3,692 651 Updated Nov 17, 2024

SpursGoZmy / Table-LLaVA

Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train Dataset for table understanding and develop a generalist tab…

Python 209 7 Updated Jun 12, 2025

wwwadx / FinVis-GPT

20 Updated Sep 13, 2023

xiaoachen98 / Open-LLaVA-NeXT

An open-source implementation for training LLaVA-NeXT.

Python 403 22 Updated Oct 23, 2024

CasualGANPapers / Make-A-Scene

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Python 336 19 Updated Aug 9, 2022

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 8,461 652 Updated May 29, 2025

pytorch / torchtune

PyTorch native post-training library

Python 5,299 644 Updated Jul 1, 2025

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 3,990 415 Updated Jul 2, 2025

InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Python 4,624 350 Updated May 29, 2025

HKUST-KnowComp / LiveSum

Codes and Datasets for the Paper: Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction

Python 12 Updated Jun 5, 2024

Ucas-HaoranWei / Vary-tiny-600k

Vary-tiny codebase upon LAVIS （for training from scratch）and a PDF image-text pairs data (about 600k including English/Chinese)

Python 84 4 Updated Sep 21, 2024

TRI-ML / prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Python 719 550 Updated Jul 4, 2024

LingyvKong / OneChart

[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"

Python 225 15 Updated Apr 14, 2025

thunlp / LLaVA-UHD

LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer

Python 382 18 Updated Apr 20, 2025

0