Gemini Embedding 2 in Jorvis: Fit Analysis

Date: 2026-03-12
Status: Planning input only
Owner: George / Architect


Summary

Gemini Embedding 2 is potentially useful for Jorvis, but not as a full RAG rewrite.

The strongest fit is a bounded multimodal document retrieval pilot:

  • PDF manuals with diagrams
  • image-heavy operating procedures
  • screenshots and visual evidence
  • optional audio/transcript retrieval as a follow-up

The weakest fit is a broad platform replatforming:

  • no external vector DB migration
  • no video-first lane
  • no replacement of the current text-only document RAG path in one step

Current Jorvis Baseline

Jorvis already has the building blocks for a multimodal pilot:

  • text/document ingestion and semantic search in DocumentService
  • pgvector-backed retrieval in the Jorvis DB
  • PDF, Office, OCR processors
  • voice STT/TTS lane
  • Gemini text embedding adapter

Important limitation:

  • the current embedding contract is still text-only
  • the current document_embeddings storage path is text-centric
  • PDF support currently extracts text, not page-image evidence
  • OCR currently converts images to text, not true image-aware retrieval

What Is High-Value for Jorvis

1. Multimodal document retrieval

Best immediate fit:

  • ingest PDF text plus page images
  • answer with text plus page/image evidence
  • attach page/source attribution

Why this matters:

  • it improves the existing secondary journey in the product north star
  • it creates a visible product delta without touching the SQL core
  • it is more relevant than generic multimodal hype

2. Image-aware retrieval with metadata

Strong medium-term fit:

  • screenshots of internal tools
  • product diagrams
  • field/ops images
  • image-to-similar-case retrieval

This is useful if Jorvis needs to answer questions using visual business evidence, not just text.

3. Audio as a retrieval signal

Potentially useful, but only as a controlled extension:

  • keep transcript-first explainability
  • add native audio embedding only as an experimental extra signal

Jorvis should not abandon transcripts for compliance, citations, or auditability.


What Is Lower-Value Right Now

Video-first retrieval

Not the best next slice:

  • higher implementation cost
  • weaker immediate ICP alignment
  • harder citation UX

External vector DB migration

Not needed yet:

  • Jorvis already has pgvector and a document search path
  • a pilot should validate product value first
  • changing vector infrastructure too early adds complexity without product proof

"Claude Code changed RAG forever"

Useful only as a prototyping accelerator.

This can help build pilot apps quickly, but it is not product moat and should not shape the roadmap by itself.


If Jorvis uses this idea, the correct first step is:

Experimental multimodal document pilot

Scope:

  • PDF + page images + OCR fallback
  • image-aware retrieval
  • provenance-first answers
  • separate storage and feature flags

Do not:

  • replace current text RAG
  • broaden into video
  • broaden into Google Workspace
  • turn it into a broad data-platform rewrite

Recommendation

Treat Gemini Embedding 2 as a narrow product-enabling capability for document/image retrieval.

Do not treat it as justification for rearchitecting the entire Jorvis retrieval stack.

The right next document in this lane is:

  • docs/product/GEMINI_EMBEDDING_2_MULTIMODAL_PILOT_EXECUTION_PLAN.md
  • docs/agent_ops/specs/task_gemini_embedding_2_multimodal_pilot_stage0_spec.md