Stars
Start and end frames video generation nodes based on the modified Kijai version Wan2.1 nodes
ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, th…
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
DeepMind's Tacotron-2 Tensorflow implementation
Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, Real-ESRGAN, Real-CUGAN, RTX Video Super Resolution VSR, SRMD, RealSR, Anime4K, RIFE, IF…
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
SD-Trainer. LoRA & Dreambooth training scripts & GUI use kohya-ss's trainer, for diffusion model.
Robust Speech Recognition via Large-Scale Weak Supervision
✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Windows system utilities to maximize productivity
《李宏毅深度学习教程》(李宏毅老师推荐👍,苹果书🍎),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases