Gemini Embedding 2 in Jorvis: Fit Analysis

Date: 2026-03-12
Status: Planning input only
Owner: George / Architect

Summary

Gemini Embedding 2 is potentially useful for Jorvis, but not as a full RAG rewrite.

The strongest fit is a bounded multimodal document retrieval pilot:

PDF manuals with diagrams
image-heavy operating procedures
screenshots and visual evidence
optional audio/transcript retrieval as a follow-up

The weakest fit is a broad platform replatforming:

no external vector DB migration
no video-first lane
no replacement of the current text-only document RAG path in one step

Current Jorvis Baseline

Jorvis already has the building blocks for a multimodal pilot:

text/document ingestion and semantic search in DocumentService
pgvector-backed retrieval in the Jorvis DB
PDF, Office, OCR processors
voice STT/TTS lane
Gemini text embedding adapter

Important limitation:

the current embedding contract is still text-only
the current document_embeddings storage path is text-centric
PDF support currently extracts text, not page-image evidence
OCR currently converts images to text, not true image-aware retrieval

What Is High-Value for Jorvis

1. Multimodal document retrieval

Best immediate fit:

ingest PDF text plus page images
answer with text plus page/image evidence
attach page/source attribution

Why this matters:

it improves the existing secondary journey in the product north star
it creates a visible product delta without touching the SQL core
it is more relevant than generic multimodal hype

2. Image-aware retrieval with metadata

Strong medium-term fit:

screenshots of internal tools
product diagrams
field/ops images
image-to-similar-case retrieval

This is useful if Jorvis needs to answer questions using visual business evidence, not just text.

3. Audio as a retrieval signal

Potentially useful, but only as a controlled extension:

keep transcript-first explainability
add native audio embedding only as an experimental extra signal

Jorvis should not abandon transcripts for compliance, citations, or auditability.

What Is Lower-Value Right Now

Video-first retrieval

Not the best next slice:

higher implementation cost
weaker immediate ICP alignment
harder citation UX

External vector DB migration

Not needed yet:

Jorvis already has pgvector and a document search path
a pilot should validate product value first
changing vector infrastructure too early adds complexity without product proof

"Claude Code changed RAG forever"

Useful only as a prototyping accelerator.

This can help build pilot apps quickly, but it is not product moat and should not shape the roadmap by itself.

Recommendation

Treat Gemini Embedding 2 as a narrow product-enabling capability for document/image retrieval.

Do not treat it as justification for rearchitecting the entire Jorvis retrieval stack.

The right next document in this lane is:

docs/product/GEMINI_EMBEDDING_2_MULTIMODAL_PILOT_EXECUTION_PLAN.md
docs/agent_ops/specs/task_gemini_embedding_2_multimodal_pilot_stage0_spec.md