Lists (1)
Sort Name ascending (A-Z)
Stars
collection of diffusion model papers categorized by their subareas
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
A series of large language models trained from scratch by developers @01-ai
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
A framework for few-shot evaluation of language models.
Scaling Data-Constrained Language Models
OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Tools to download and cleanup Common Crawl data
A quick guide (especially) for trending instruction finetuning datasets
Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"
The official GitHub page for the survey paper "A Survey of Large Language Models".
A series of large language models developed by Baichuan Intelligent Technology
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
Diffusion model papers, survey, and taxonomy
Generative Agents: Interactive Simulacra of Human Behavior
[IJCV2024] Exploiting Diffusion Prior for Real-World Image Super-Resolution
Awesome-LLM: a curated list of Large Language Model
Digital Human Resource Collection: 2D/3D/4D human modeling, avatar generation & animation, clothed people digitalization, virtual try-on, and others.
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
An elegant PyTorch deep reinforcement learning library.
A curated list of Multimodal Related Research.
Reading list for research topics in multimodal machine learning
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
A GPT-4 AI Tutor Prompt for customizable personalized learning experiences.