-
Peking University
- 2000017426@stu.pku.edu.cn
Highlights
- Pro
Stars
MMMG:AMassive,Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning
MINT-1T: A one trillion token multimodal interleaved dataset.
Examples and guides for using the OpenAI API
GenEval: An object-focused framework for evaluating text-to-image alignment
Hackable and optimized Transformers building blocks, supporting a composable construction.
SPEAR: A Simulator for Photorealistic Embodied AI Research
A generative world for general-purpose robotics & embodied AI learning.
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
Official codebase for the Paper “Retrieval-Augmented Diffusion Models”
用大模型批量处理数据,现支持各种大模型做OCR,支持通义千问, 月之暗面, 百度飞桨OCR, OpenAI 和LLAVA。Use LLM to generate or clean data for academic use. Support OCR with qwen, moonshot, PaddleOCR, OpenAI, Llava.
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
An open-source framework for training large multimodal models.
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
ancient-chat-llm: A LLM which is proficient in Chinese culture 古语说: 一个精通中国文化的大模型
The official Pytorch Implementation for ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation (CVPR 2024)
Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction
Lumina-T2X is a unified framework for Text to Any Modality Generation
Combining MMOCR with Segment Anything & Stable Diffusion. Automatically detect, recognize and segment text instances, with serval downstream tasks, e.g., Text Removal and Text Inpainting