Starred repositories
SE-VGAE: Unsupervised Disentangled Representation Learning for Interpretable Architectural Layout Design Graph Generation
A generative speech model for daily dialogue.
FP4S: Floor plan image segmentation via scribble-based semi-weakly-supervised learning
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Curated list of project-based tutorials
This project is established for real-time training of the RWKV model.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
WebUI extension for ControlNet
Stable Diffusion web UI
Open source code for AAAI 2023 Paper "BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning"
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
[CVPR 2022] Official code for "Unified Contrastive Learning in Image-Text-Label Space"
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Code for ALBEF: a new vision-language pre-training method
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.
[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models
Official PyTorch implementation of BlobGAN: Spatially Disentangled Scene Representations
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
[ACM MM 2022] Towards Counterfactual Image Manipulation via CLIP
Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)
Pytorch Implementation for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"
Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"
[NeurIPS'23] Parts of Speech–Grounded Subspaces in Vision-Language Models
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation