Visual Input Options for Jorvis
Visual Input Options for Jorvis
Last Updated: 2026-02-15
Status: Active guide
Jorvis supports two powerful methods for providing visual context to the LLM:
- Browser Extension ("Ask Open WebUI Everywhere")
- Desktop Vision MCP Server (Local Python Server)
This guide helps you choose and set up the right tool for your needs.
For current production/runtime snapshot and governance decisions, use:
docs/handoff/CHECKPOINT.mddocs/agent_ops/TASK_BOARD.md
Quick Comparison
| Feature | Browser Extension | Desktop Vision MCP |
|---|---|---|
| What it sees | Active browser tab only | Entire desktop (all apps) |
| Installation | 1-click Chrome Store | Python pip install |
| Trigger | Manual (Button/Menu) | Automatic (LLM Tool Call) |
| Privacy | High (Page only, sandboxed) | Local-only, but full screen access |
| Best For | Web debugging, reading articles | IDE help, terminal errors, cross-app context |
Option 1: Browser Extension (Recommended for Web)
The Ask Open WebUI Everywhere extension lets you send screenshots and content from any webpage directly to Jorvis.
Setup
-
Install the Extension:
- Chrome Web Store (Search for "Ask Open WebUI Everywhere")
- Or install from GitHub: ToryPan/ask-open-webui-everywhere
-
Configure:
- Click the extension icon in your browser toolbar.
- Go to Settings (gear icon).
- Set Open WebUI URL:
http://localhost:8080(or your Jorvis instance URL).
Usage
- Right-click any text or image -> "Send to Open WebUI"
- Click "Insert Screen" button in the floating panel to capture the visible tab.
- Chat with Gemini about the captured content immediately.
Option 2: Desktop Vision MCP (Recommended for Devs)
The Desktop Vision MCP Server runs locally and gives Jorvis "eyes" on your entire operating system.
Setup
See the Desktop Vision Setup Guide for full instructions.
Quick Start:
cd tools/desktop-vision
pip install -r requirements.txt
python main.py
Then configure Open WebUI to connect to http://localhost:8000/sse via Settings -> Tools -> MCP.
Usage
Jorvis can automatically invoke this tool when you ask questions like:
- "What's on my screen?"
- "Help me fix this error in my terminal."
- "Look at my VS Code window."
Security & Privacy Settings
Browser Extension
- Permissions: Requires valid access to the active tab to capture it.
- Data Flow: Screenshot -> Open WebUI -> Gemini API (Google).
- Isolation: Cannot see your file system or other applications.
Desktop Vision MCP
- Permissions: Requires Screen Recording permission on macOS.
- Scope: Can see everything on the selected monitor.
- Advice: Stop the server (
Ctrl+C) when working with sensitive data (banking, secrets).
Troubleshooting
- Extension not connecting? Check if Jorvis is running (
docker ps) and accessible at the URL configured. - Black screen on macOS? Grant "Screen Recording" permission to your Terminal or Python executable.
- Screenshots too large? Both tools automatically resize images to fit context limits.