- shanghai
- http://yuzhi.run/blog
Starred repositories
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
A Large-scale, Multi-modal, Compound Affective Database for Dynamic Facial Expression Recognition in the Wild.
High-resolution models for human tasks.
PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.
Consistent Subject Generation via Contrastive Instantiated Concepts
A collection of resources on personalized image generation.
Align Anything: Training All-modality Model with Feedback
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
Training-free Regional Prompting for Diffusion Transformers 🔥
Understanding Deep Learning - Simon J.D. Prince
Conceptrol: Concept Control of Zero-shot Personalized Image Generation
A minimal and universal controller for FLUX.1.
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
#1 Locally hosted web application that allows you to perform various operations on PDF files
Unofficial Implementation of Animate Anyone by Novita AI
An extremely fast Python package and project manager, written in Rust.
Wan: Open and Advanced Large-Scale Video Generative Models
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Solve Visual Understanding with Reinforced VLMs
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.