Lists (2)
Sort Name ascending (A-Z)
Stars
🎨 Turn your roughest sketches into stunning 3D worlds by vibe drawing
Fully open reproduction of DeepSeek-R1
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and te…
Toolkit for linearizing PDFs for LLM datasets/training
🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
✨✨Latest Papers and Datasets on Mobile and PC GUI Agent
A simple screen parsing tool towards pure vision based GUI agent
🧑🚀 全世界最好的LLM资料总结(Agent框架、辅助编程、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.
Speech To Speech: an effort for an open-sourced and modular GPT4-o
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Waterfall-style image viewer for macOS, offering a smooth and immersive browsing experience.
Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Real time interactive streaming digital human
Llama3、Llama3.1 中文后训练版仓库 - 微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档。
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A modular graph-based Retrieval-Augmented Generation (RAG) system