Stars
[ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
👓 A curated list of awesome android learning resources for android app developers.
Google Research
工位区域员工在岗检测员工行为监测图像分割系统源码和数据集:改进yolo11-ODConv
Consists of ~500k human annotations on the RICO dataset identifying various icons based on their shapes and semantics, and associations between selected general UI elements and their text labels. A…
[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web" -- the first LLM-based web agent and benchmark for generalist web agents
Offical implementation of "Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"
GUI Grounding for Professional High-Resolution Computer Use
A simple screen parsing tool towards pure vision based GUI agent
It includes two datasets that are used in the downstream tasks for evaluating UIBert: App Similar Element Retrieval data and Visual Item Selection (VIS) data. Both datasets are written TFRecords.
This repository contains all the code examples, projects, and resources used in "The Complete Hugging Face Blueprint" book. The book provides a comprehensive guide to using Hugging Face's ecosystem…
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
Prompt Declaration Language (PDL) is a declarative prompt programming language.
MS-Agent: Lightweight Framework for Empowering Agents with Autonomous Exploration
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
AI-Chef / HuggingGPT
Forked from microsoft/JARVISJARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
ai-town / ai-town-cn
Forked from Steven-Luo/ai-town-cnA MIT-licensed, deployable starter kit for building and customizing your own version of AI town - a virtual town where AI characters live, chat and socialize. AI Town中文版
LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations.
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
Generative Agents: Interactive Simulacra of Human Behavior
Agent that empowers software testing with LLMs; industrial-first in China
StyleShot: A SnapShot on Any Style. 一款可以迁移任意风格到任意内容的模型,无需针对图片微调,即能生成高质量的个性风格化图片!