Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
Maximize the potential of Cursor best practices for Automatic Rule and Custom Agent Generation and Agile Workflows
Wan: Open and Advanced Large-Scale Video Generative Models
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Fully open reproduction of DeepSeek-R1
🎨 数学公式识别增强版:中英文手写印刷公式、支持初级符号推导(数据结构基于 LaTeX 抽象语法树)Math Formula OCR Pro, supports handwrite, Chinese-mixed formulas and simple symbol reasoning (based on LaTeX AST).
Train a 1B LLM with 1T tokens from scratch by personal
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Official Code for 'TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction'
This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"
[ICCV2023] DETRDistill: A Universal Knowledge Distillation Framework for DETR-families
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Meshed-Memory Transformer for Image Captioning. CVPR 2020
Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)
Pytorch Implementation for CVPR 2024 paper: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.