Visual Input Options for Jorvis

Last Updated: 2026-02-15
Status: Active guide

Jorvis supports two powerful methods for providing visual context to the LLM:

This guide helps you choose and set up the right tool for your needs.

For current production/runtime snapshot and governance decisions, use:

Quick Comparison

Feature	Browser Extension	Desktop Vision MCP
What it sees	Active browser tab only	Entire desktop (all apps)
Installation	1-click Chrome Store	Python `pip install`
Trigger	Manual (Button/Menu)	Automatic (LLM Tool Call)
Privacy	High (Page only, sandboxed)	Local-only, but full screen access
Best For	Web debugging, reading articles	IDE help, terminal errors, cross-app context

The Ask Open WebUI Everywhere extension lets you send screenshots and content from any webpage directly to Jorvis.

Install the Extension:
- Chrome Web Store (Search for "Ask Open WebUI Everywhere")
- Or install from GitHub: ToryPan/ask-open-webui-everywhere
Configure:
- Click the extension icon in your browser toolbar.
- Go to Settings (gear icon).
- Set Open WebUI URL: http://localhost:8080 (or your Jorvis instance URL).

Right-click any text or image -> "Send to Open WebUI"
Click "Insert Screen" button in the floating panel to capture the visible tab.
Chat with Gemini about the captured content immediately.

The Desktop Vision MCP Server runs locally and gives Jorvis "eyes" on your entire operating system.

See the Desktop Vision Setup Guide for full instructions.

Quick Start:

cd tools/desktop-vision
pip install -r requirements.txt
python main.py

Then configure Open WebUI to connect to http://localhost:8000/sse via Settings -> Tools -> MCP.

Jorvis can automatically invoke this tool when you ask questions like:

Permissions: Requires Screen Recording permission on macOS.
Scope: Can see everything on the selected monitor.
Advice: Stop the server (Ctrl+C) when working with sensitive data (banking, secrets).

Extension not connecting? Check if Jorvis is running (docker ps) and accessible at the URL configured.
Black screen on macOS? Grant "Screen Recording" permission to your Terminal or Python executable.
Screenshots too large? Both tools automatically resize images to fit context limits.