Stars
real time face swap and one-click video deepfake with only a single image
🖱️ Generate human-like mouse movements with puppeteer or on any 2D plane
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, 8000 classification, regression and other machine learning tasks for Python, R, Java, C++. Supports comp…
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
LAVIS - A One-stop Library for Language-Vision Intelligence
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Industry leading face manipulation platform
[ICLR 2025 Oral] TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
Windows virtual camera driver using the AVStream minidriver.
Code for "Transformer Networks for Trajectory Forecasting"
🔥🔥 hooker is a Frida-based reverse engineering toolkit for Android. It offers a user-friendly CLI, universal scripts, auto hook generation, memory roaming to detect activities/services, one-click S…
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Plugin for using Core SignalR in Unity WebGL
C++ based gRPC (C++, Python, Ruby, Objective-C, PHP, C#)
Open-Sora: Democratizing Efficient Video Production for All
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Imitation learning algorithms with Co-training for Mobile ALOHA: ACT, Diffusion Policy, VINN
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities