Stars
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
Chemical reaction data & benchmarks. Extraction and cleaning of data from Open Reaction Database (ORD)
A Python package for processing molecules with RDKit in scikit-learn
A Conversational Speech Generation Model
No fortress, purely open ground. OpenManus is Coming.
A topic-centric list of HQ open datasets.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space
KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge ba…
A generative world for general-purpose robotics & embodied AI learning.
[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning
Instant neural graphics primitives: lightning fast NeRF and more
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
This repository contains demos I made with the Transformers library by HuggingFace.
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer