8000 yzd-v (Mesopotamia) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View yzd-v's full-sized avatar

Block or report yzd-v

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Offical implementation of "Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"

16 1 Updated May 21, 2025

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

Python 11,906 2,256 Updated May 31, 2025
Python 220 11 Updated May 20, 2025

Herbarium competition details

46 5 Updated May 10, 2022

[IJCV] Bamboo: 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning.

Python 177 7 Updated Apr 7, 2024

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Python 8,149 821 Updated Aug 12, 2024

finetune your florence2 model easy

Python 18 2 Updated Jul 8, 2024

Quick exploration into fine tuning florence 2

Jupyter Notebook 315 29 Updated Sep 19, 2024

[CVPR 2025 Oral & Best Paper Award Candidate] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Python 825 24 Updated May 20, 2025

Everything about the SmolLM2 and SmolVLM family of models

Python 2,457 150 Updated Mar 31, 2025
Python 382 36 Updated Dec 12, 2024

A Framework of Small-scale Large Multimodal Models

Python 824 86 Updated Apr 26, 2025

Strong and Open Vision Language Assistant for Mobile Devices

Python 1,221 81 Updated Apr 15, 2024

Lets make video diffusion practical!

Python 13,891 1,210 Updated May 4, 2025

MAGI-1: Autoregressive Video Generation at Scale

Python 3,195 180 Updated May 30, 2025

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 22,275 1,868 Updated Mar 26, 2025

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

702 45 Updated Jun 1, 2025

Your AI Operator for Web, Android, Automation & Testing.

TypeScript 9,035 554 Updated Jun 1, 2025

A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.

TypeScript 14,347 1,187 Updated Jun 1, 2025
Python 6,277 414 Updated May 21, 2025

Efficient vision foundation models for high-resolution generation and perception.

Python 2,895 225 Updated Apr 24, 2025

DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding

Python 1,062 40 Updated May 26, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 46,327 8,101 Updated May 27, 2025

🌐 Make websites accessible for AI agents. Automate tasks online with ease.

Python 62,057 6,962 Updated Jun 1, 2025

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 26,774 3,345 Updated Sep 24, 2024

Wan: Open and Advanced Large-Scale Video Generative Models

Python 11,851 1,368 Updated May 27, 2025

获取同花顺问财数据

JavaScript 578 182 Updated May 7, 2025

Orient Anything, ICML 2025

Python 276 11 Updated May 17, 2025

OpenMMLab Rotated Object Detection Toolbox and Benchmark

Python 1,980 601 Updated Sep 28, 2024
Next
0