Description
Describe the bug
The coordinate system fails to provide a range when an Android app has a floating modal, like a tooltip or a pop-up.
Configuration (please complete the following information):
- Agent: [e.g, Cursor]
- OS: [e.g, Mac]
- Device used: [e.g. Android]
- Device model: [e.g., Google Pixel 6A]
To Reproduce
Steps to reproduce the behavior:
- Use prompt '...'
Find coordinates of a button with text "Continue".
Requirements (must validate each):
[REQ-1] Start from (0,0) at top-left
[REQ-2] Count ALL scrollable content, even if not visible in viewport
[REQ-3] This element is not in the element list.
[REQ-4] Do not make assumptions based on visual position in viewport.
[REQ-5] Provide coordinate ranges as: X: [start]-[end], Y: [start]-[end]
Current Behavior
Because the element is in a floating popup, it is not present in the main view hierarchy. This causes LLM to get confused while calculating. MCP/LLM is always reverting to the element list and trying to estimate where this text is on the screen. I tried giving clear directions to ignore the element list and giving the exact location, and then reasoning with LLM to auto-correct. It succeeds but fails again while using a prompt
Expected behavior
I am expecting a range for a given text on the screen
Screenshots
If applicable, add screenshots to help explain your problem.