Lists (1)
Sort Name ascending (A-Z)
Starred repositories
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & V…
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
A reproduction of growing neural cellular automata using PyTorch.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
DeepEP: an efficient expert-parallel communication library
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"
Minimal reproduction of DeepSeek R1-Zero
OCR, layout analysis, reading order, table recognition in 90+ languages
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, …
This repo includes ChatGPT prompt curation to use ChatGPT and other LLM tools better.
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero
Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.
UniTable: Towards a Unified Table Foundation Model
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
✨✨Latest Advances on Multimodal Large Language Models
PKU-DAIR / RAG-Survey
Forked from hymie122/RAG-SurveyCollecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
🎨 数学公式识别增强版:中英文手写印刷公式、支持初级符号推导(数据结构基于 LaTeX 抽象语法树)Math Formula OCR Pro, supports handwrite, Chinese-mixed formulas and simple symbol reasoning (based on LaTeX AST).
Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
ERNIE Bot Agent is a Large Language Model (LLM) Agent Framework, powered by the advanced capabilities of ERNIE Bot and the platform resources of Baidu AI Studio.