8000 AFeng-x (AFeng) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View AFeng-x's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Hongkong | Shenzhen

Block or report AFeng-x

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Open-source Multi-agent Poster Generation from Papers

Python 1,172 38 Updated May 30, 2025

FULL v0, Cursor, Manus, Same.dev, Lovable, Devin, Replit Agent, Windsurf Agent, VSCode Agent, Dia Browser & Trae AI (And other Open Sourced) System Prompts, Tools & AI Models.

53,647 16,430 Updated May 21, 2025

Open-source unified multimodal model

Python 3,322 217 Updated May 28, 2025

Everything about the SmolLM2 and SmolVLM family of models

Python 2,453 149 Updated Mar 31, 2025

[CVPR'24 Oral] Official repository of Point Transformer V3 (PTv3)

Python 1,161 66 Updated Apr 24, 2025

Lightweight coding agent that runs in your terminal

TypeScript 27,492 2,858 Updated May 30, 2025

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Python 1,216 125 Updated Apr 4, 2025
Python 1,064 34 Updated May 30, 2025

🚀 One-stop solution for creating your digital avatar from chat logs 💡 Fine-tune LLMs with your chat logs to capture your unique style, then bind to a chatbot to bring your digital self to life. 从聊天…

Python 12,535 935 Updated May 30, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,114 41 Updated May 21, 2025

DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.

TypeScript 12,227 1,268 Updated May 29, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,702 235 Updated May 29, 2025

The evaluation benchmark on MCP servers

Python 114 3 Updated May 21, 2025

Suna - Open Source Generalist AI Agent

TypeScript 13,515 1,941 Updated May 30, 2025

A curated collection of resources, tools, and frameworks for developing GUI Agents.

48 2 Updated May 30, 2025

MAGI-1: Autoregressive Video Generation at Scale

Python 3,188 178 Updated May 30, 2025

Model Context Protocol(MCP) 编程极速入门

1,963 114 Updated Apr 23, 2025

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 1,168 58 Updated May 28, 2025

Lets make video diffusion practical!

Python 13,855 1,202 Updated May 4, 2025

Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated visuals created with ChatGPT and Sora, showcasing OpenAI’s advanced image generation capab…

JavaScript 6,188 554 Updated May 26, 2025

An open protocol enabling communication and interoperability between opaque agentic applications.

TypeScript 16,261 1,573 Updated May 30, 2025

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

871 40 Updated Apr 20, 2025

【CVPR 2025 Oral】Official Repo for Paper "AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea"

Jupyter Notebook 128 5 Updated Apr 5, 2025

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

Python 5,847 646 Updated Mar 19, 2025

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

611 16 Updated May 20, 2025

11 Lessons to Get Started Building AI Agents

Jupyter Notebook 22,515 6,034 Updated May 26, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,047 236 Updated May 28, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 7,855 666 Updated May 30, 2025

A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.

TypeScript 14,326 1,183 Updated May 30, 2025
Next
0