🎨 数学公式识别增强版：中英文手写印刷公式、支持初级符号推导（数据结构基于 LaTeX 抽象语法树）Math Formula OCR Pro, supports handwrite, Chinese-mixed formulas and simple symbol reasoning (based on LaTeX AST).

Jupyter Notebook 1,220 236 Updated Jun 11, 2024

zhanshijinwat / Steel-LLM

Train a 1B LLM with 1T tokens from scratch by personal

Jupyter Notebook 654 69 Updated Apr 27, 2025

rom1504 / img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python 4,037 355 Updated Aug 7, 2024

NVIDIA / Cosmos

New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos

Jupyter Notebook 7,985 512 Updated Apr 29, 2025

LLaVA-VL / LLaVA-NeXT

Python 3,837 359 Updated May 6, 2025

yunlong10 / Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

2,299 102 Updated May 4, 2025

JUNJIE99 / MLVU

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

Python 199 1 Updated Mar 24, 2025

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 3,239 262 Updated Jan 18, 2025

HVision-NKU / TAR3D

Official Code for 'TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction'

56 Updated Dec 26, 2024

Leon1207 / Video-RAG-master

This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"

Python 188 19 Updated Feb 23, 2025

BIVLab-USTC / DETRDistill

[ICCV2023] DETRDistill: A Universal Knowledge Distillation Framework for DETR-families

Jupyter Notebook 53 6 Updated Nov 3, 2023

showlab / MovieSeq

[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences

Jupyter Notebook 39 1 Updated Mar 11, 2025

showlab / Show-o

[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,401 60 Updated Apr 28, 2025

bklieger-groq / g1

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Python 4,217 377 Updated Jan 27, 2025

Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 7,566 660 Updated Feb 10, 2025

aimagelab / meshed-memory-transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020

Python 537 135 Updated Dec 21, 2022

kirill-vish / Beyond-INet

Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"

Python 101 5 Updated Sep 11, 2024

OpenBMB / MiniCPM

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.

Jupyter Notebook 7,351 463 Updated Nov 6, 2024

Doragd / Algorithm-Practice-in-Industry

搜索、推荐、广告、用增等工业界实践文章收集（来源：知乎、Datafuntalk、技术公众号）

Python 3,378 392 Updated May 21, 2025

dogehhh / ReCLIP

Pytorch Implementation for CVPR 2024 paper: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation

Python 43 1 Updated Apr 24, 2025

QwenLM / Qwen2.5-VL

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 10,525 757 Updated May 15, 2025

TeaQwQTea Cherishnoobs

Highlights

Organizations

Lists (1)

✨ Inspiration

Starred repositories

Code quality

Algorithm