Starred repositories
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source …
A curated list of graph-based fraud, anomaly, and outlier detection papers & resources
🤗更优雅的微信公众号订阅方式,支持私有化部署、微信公众号RSS生成(基于微信读书)
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Model Context Protocol Servers
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
💯 Curated coding interview preparation materials for busy software engineers
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
A toolkit for blockchain data collection
Awesome-RAG: Collect typical RAG papers and systems.
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
A curated collection of public industrial datasets.
A collection of open datasets for industrial applications, divided by categories
A topic-centric list of HQ open datasets.
🔊 Text-Prompted Generative Audio Model