8000 GitHub - fwaris/FsOperator: A CUA (computer-use-agent) sample in F#
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fwaris/FsOperator

Repository files navigation

FsOperator

FsOperator is an F# sample application that utilizes the OpenAI responses API to enable a computer use agent.

It currently includes a partial implementation of the responses API, which will likely become the dominant interface over time—combining capabilities from both chat and assistant APIs.

The application uses the new OpenAI computer_use_preview generative model as its core. This model operates in an observe-act-observe loop: it visually interprets a browser page and issues commands (click, scroll, type, etc.) to accomplish tasks based on user instructions.

Below is a demonstration an Amazon search session to find a specific type of cell phone case:

screenshot


⚠️ Caveats

At the current level of technology, human supervision is required—though the future looks very promising.


🚀 Application Usage

  • Ensure .NET 9.x runtime is installed. (May require Chromium components to be installed also)
  • Text chat mode has been tested successfully on Windows and MacOS.
  • Voice chat mode currently works on Windows only (the WebRTC library for MacOS is yet to be included)
  • Set the OPENAI_API_KEY environment variable to provide your OpenAI API key.

Steps:

  1. Enter a URL in the top address bar (must include https://; e.g., https://amazon.com).
  2. Hit Enter to navigate. Log in manually if required.
  3. Input task instructions in the instructions box.
  4. Click 'Start Task' to begin the task.
  5. Alternatively, use the voice mode to directly chat with the OpenAI real-time voice assistant to generate and execute instructions for CUA model.
  6. Observe computer actions issued by the model on right end of the address bar. Any warnings, status or error messages are shown at the bottom.
  7. See log messages by expanding the panel on the right. It shows all API messages that are sent and received.

Note: the screen shakes when Puppeteer takes a screenshot. It seems to be coming from Chromium itself - not sure if there is a fix available.

The model may issue security warnings that ideally should be acknowledged by a human. Currently, the app implicitly acknowledges these warnings to allow the process to continue uninterrupted. They are shown in the status bar.

  • The process continues until the model returns an assistant message (completing its "turn").
  • The resulting message is displayed in the Chat box.
  • You may respond to the message and continue the task further
  • Or cancel the task by clicking Cancel Task

⚠️ Caveat: This is beta code. It has not been fully battle-tested.


🛠️ Technical Overview

  • The embedded browser control is used to simulate real-world web interactions.
    • On Windows, this requires Chromium.
  • The embedded browser comes from WebViewControl-Avalonia
  • Internally uses PuppeteerSharp to drive browser automation and take screenshots.
    • Note: In the code you can switch to the external browser as an option.
  • The UI is built using the versatile Avalonia.FuncUI, a functional UI framework for Avalonia.

Internally, Puppeteer makes a WebSocket connection to the embedded browser via CDP (Chrome DevTools Protocol). This connection is used to take screenshots to send to the CUA model and also to execute the commands issued by the model.


About

A CUA (computer-use-agent) sample in F#

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0