Visual Input Options for Jorvis

Last Updated: 2026-02-15
Status: Active guide

Jorvis supports two powerful methods for providing visual context to the LLM:

  1. Browser Extension ("Ask Open WebUI Everywhere")
  2. Desktop Vision MCP Server (Local Python Server)

This guide helps you choose and set up the right tool for your needs.

For current production/runtime snapshot and governance decisions, use:

  • docs/handoff/CHECKPOINT.md
  • docs/agent_ops/TASK_BOARD.md

Quick Comparison

FeatureBrowser ExtensionDesktop Vision MCP
What it seesActive browser tab onlyEntire desktop (all apps)
Installation1-click Chrome StorePython pip install
TriggerManual (Button/Menu)Automatic (LLM Tool Call)
PrivacyHigh (Page only, sandboxed)Local-only, but full screen access
Best ForWeb debugging, reading articlesIDE help, terminal errors, cross-app context

The Ask Open WebUI Everywhere extension lets you send screenshots and content from any webpage directly to Jorvis.

Setup

  1. Install the Extension:

  2. Configure:

    • Click the extension icon in your browser toolbar.
    • Go to Settings (gear icon).
    • Set Open WebUI URL: http://localhost:8080 (or your Jorvis instance URL).

Usage

  • Right-click any text or image -> "Send to Open WebUI"
  • Click "Insert Screen" button in the floating panel to capture the visible tab.
  • Chat with Gemini about the captured content immediately.

The Desktop Vision MCP Server runs locally and gives Jorvis "eyes" on your entire operating system.

Setup

See the Desktop Vision Setup Guide for full instructions.

Quick Start:

cd tools/desktop-vision
pip install -r requirements.txt
python main.py

Then configure Open WebUI to connect to http://localhost:8000/sse via Settings -> Tools -> MCP.

Usage

Jorvis can automatically invoke this tool when you ask questions like:

  • "What's on my screen?"
  • "Help me fix this error in my terminal."
  • "Look at my VS Code window."

Security & Privacy Settings

Browser Extension

  • Permissions: Requires valid access to the active tab to capture it.
  • Data Flow: Screenshot -> Open WebUI -> Gemini API (Google).
  • Isolation: Cannot see your file system or other applications.

Desktop Vision MCP

  • Permissions: Requires Screen Recording permission on macOS.
  • Scope: Can see everything on the selected monitor.
  • Advice: Stop the server (Ctrl+C) when working with sensitive data (banking, secrets).

Troubleshooting

  • Extension not connecting? Check if Jorvis is running (docker ps) and accessible at the URL configured.
  • Black screen on macOS? Grant "Screen Recording" permission to your Terminal or Python executable.
  • Screenshots too large? Both tools automatically resize images to fit context limits.