Starred repositories
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
SuperSonic is the next-generation AI+BI platform that unifies Chat BI (powered by LLM) and Headless BI (powered by semantic layer) paradigms.
Open-source generalized AI agent for everyday task automations.
No fortress, purely open ground. OpenManus is Coming.
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
A machine learning-based video super resolution and frame interpolation framework. Est. Hack the Valley II, 2018.
Janus-Series: Unified Multimodal Understanding and Generation Models
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Wan: Open and Advanced Large-Scale Video Generative Models
[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
ModelScope: bring the notion of Model-as-a-Service to life.
Open source annotation tool for machine learning practitioners.
Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support
Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.
WiNGPT是一个基于GPT的医疗垂直领域大模型,旨在将专业的医学知识、医疗信息、数据融会贯通,为医疗行业提供智能化的医疗问答、诊断支持和医学知识等信息服务,提高诊疗效率和医疗服务质量。
一个零码 , 低码 , AI的微服务接口编排 & 系统集成的强大编排平台,支持Http , Dubbo , WebService等协议的接口编排,支持通过Groovy , JavaScript , Python , Java等多种脚本语言来增强流程,支持使用MySQL,达梦等多种常见数据源。
Langflow is a powerful tool for building and deploying AI-powered agents and workflows.
No-code multi-agent framework to build LLM Agents, workflows and applications with your data
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
🦜🪺 Parakeet is a GoLang library, made to simplify the development of small generative AI applications with Ollama 🦙.
LLM UI with advanced features, easy setup, and multiple backend support.
A modern GUI client based on Tauri, designed to run in Windows, macOS and Linux for tailored proxy experience