8000 coder4nlp / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View coder4nlp's full-sized avatar

Block or report coder4nlp

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The official implementation of "Learning Compact Vision Tokens for Efficient Large Multimodal Models"

Python 29 2 Updated Jun 11, 2025

Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.

Python 82 4 Updated Jun 29, 2025

Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.

173 13 Updated Nov 10, 2024

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,409 2,239 Updated Feb 1, 2025

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

Python 1,991 177 Updated Jul 1, 2025

A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning

Python 33 3 Updated Jun 3, 2025

Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊

267 7 Updated Jan 27, 2025

GLM-4-Voice | 端到端中英语音对话模型

Python 2,965 252 Updated Dec 5, 2024

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 7,694 673 Updated Feb 10, 2025

This is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi models are the most capable and cost-effective small langua…

Jupyter Notebook 3,389 431 Updated Jun 27, 2025

A Survey on Benchmarks of Multimodal Large Language Models

116 9 Updated Jul 1, 2025

MuCR is a benchmark designed to evaluate Multimodal Large Language Models' (MLLMs) ability to discern causal links across modalities

15 2 Updated May 27, 2025

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

Python 603 29 Updated Apr 1, 2025
Python 37 2 Updated Jul 9, 2024

ReLE中文大模型能力评测(持续更新):目前已囊括257个大模型,覆盖chatgpt、gpt-4.1、o4-mini、谷歌gemini-2.5、Claude、智谱GLM-Z1、文心一言、qwen-max、百川、讯飞星火、商汤senseChat、minimax等商用模型, 以及DeepSeek-R1-0528、qwq-32b、deepseek-v3、qwen3、llama4、phi-4、glm…

4,458 184 Updated Jun 23, 2025

FinRobot: An Open-Source AI Agent Platform for Financial Analysis using LLMs 🚀 🚀 🚀

Jupyter Notebook 3,692 651 Updated Nov 17, 2024

Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train Dataset for table understanding and develop a generalist tab…

Python 209 7 Updated Jun 12, 2025
20 Updated Sep 13, 2023

An open-source implementation for training LLaVA-NeXT.

Python 403 22 Updated Oct 23, 2024

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Python 336 19 Updated Aug 9, 2022

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 8,461 652 Updated May 29, 2025

PyTorch native post-training library

Python 5,299 644 Updated Jul 1, 2025

A PyTorch native platform for training generative AI models

Python 3,990 415 Updated Jul 2, 2025

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Python 4,624 350 Updated May 29, 2025

Codes and Datasets for the Paper: Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction

Python 12 Updated Jun 5, 2024

Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)

Python 84 4 Updated Sep 21, 2024

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Python 719 550 Updated Jul 4, 2024

[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"

Python 225 15 Updated Apr 14, 2025

LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer

Python 382 18 Updated Apr 20, 2025
Next
0