Gemini Embedding 2 in Jorvis: Fit Analysis
Gemini Embedding 2 in Jorvis: Fit Analysis
Date: 2026-03-12
Status: Planning input only
Owner: George / Architect
Summary
Gemini Embedding 2 is potentially useful for Jorvis, but not as a full RAG rewrite.
The strongest fit is a bounded multimodal document retrieval pilot:
- PDF manuals with diagrams
- image-heavy operating procedures
- screenshots and visual evidence
- optional audio/transcript retrieval as a follow-up
The weakest fit is a broad platform replatforming:
- no external vector DB migration
- no video-first lane
- no replacement of the current text-only document RAG path in one step
Current Jorvis Baseline
Jorvis already has the building blocks for a multimodal pilot:
- text/document ingestion and semantic search in
DocumentService - pgvector-backed retrieval in the Jorvis DB
- PDF, Office, OCR processors
- voice STT/TTS lane
- Gemini text embedding adapter
Important limitation:
- the current embedding contract is still text-only
- the current
document_embeddingsstorage path is text-centric - PDF support currently extracts text, not page-image evidence
- OCR currently converts images to text, not true image-aware retrieval
What Is High-Value for Jorvis
1. Multimodal document retrieval
Best immediate fit:
- ingest PDF text plus page images
- answer with text plus page/image evidence
- attach page/source attribution
Why this matters:
- it improves the existing secondary journey in the product north star
- it creates a visible product delta without touching the SQL core
- it is more relevant than generic multimodal hype
2. Image-aware retrieval with metadata
Strong medium-term fit:
- screenshots of internal tools
- product diagrams
- field/ops images
- image-to-similar-case retrieval
This is useful if Jorvis needs to answer questions using visual business evidence, not just text.
3. Audio as a retrieval signal
Potentially useful, but only as a controlled extension:
- keep transcript-first explainability
- add native audio embedding only as an experimental extra signal
Jorvis should not abandon transcripts for compliance, citations, or auditability.
What Is Lower-Value Right Now
Video-first retrieval
Not the best next slice:
- higher implementation cost
- weaker immediate ICP alignment
- harder citation UX
External vector DB migration
Not needed yet:
- Jorvis already has pgvector and a document search path
- a pilot should validate product value first
- changing vector infrastructure too early adds complexity without product proof
"Claude Code changed RAG forever"
Useful only as a prototyping accelerator.
This can help build pilot apps quickly, but it is not product moat and should not shape the roadmap by itself.
Recommended Product Direction
If Jorvis uses this idea, the correct first step is:
Experimental multimodal document pilot
Scope:
- PDF + page images + OCR fallback
- image-aware retrieval
- provenance-first answers
- separate storage and feature flags
Do not:
- replace current text RAG
- broaden into video
- broaden into Google Workspace
- turn it into a broad data-platform rewrite
Recommendation
Treat Gemini Embedding 2 as a narrow product-enabling capability for document/image retrieval.
Do not treat it as justification for rearchitecting the entire Jorvis retrieval stack.
The right next document in this lane is:
docs/product/GEMINI_EMBEDDING_2_MULTIMODAL_PILOT_EXECUTION_PLAN.mddocs/agent_ops/specs/task_gemini_embedding_2_multimodal_pilot_stage0_spec.md