Stars
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
[CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A Generative AI evolution pieline on Github Actions
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
This is the repo for our new project Highly Accurate Dichotomous Image Segmentation
An open source implementation of CLIP.
A photo mosaic (pixel collage) maker. Use all your friends' profile pictures to approximate your profile picture! 如何用 Python 制作一个炫酷的微信好友图
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Pytorch implementation of prototypical networks in few shot learning
A concise but complete implementation of CLIP with various experimental improvements from recent papers
DataComp: In search of the next generation of multimodal datasets
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
Overview and tutorial of the LangChain Library
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
ImageBind One Embedding Space to Bind Them All
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
🔥Highlighting the top ML papers every week.
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
PyTorch code and models for the DINOv2 self-supervised learning method.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
An open-source framework for training large multimodal models.
Code and documentation to train Stanford's Alpaca models, and generate the data.