Embedding V2 Migration: Zero Downtime¶
Upgrading embedding models in a live production environment is notoriously difficult. A simple swap invalidates all existing vector indexes, requiring a full re-index that can take hours or days, causing massive search degradation or complete downtime.
Jorvis solved this by implementing a Zero Downtime Migration strategy using Dual Write and Shadow Validation.
1. Schema Preparation¶
We began by altering our database schema to include parallel vector columns (embedding_v2) alongside our existing embedding columns. This non-destructive change allowed us to store the new vectors without impacting the current production read paths.
2. Dual Write¶
Once the schema was prepared, we enabled Dual Write. Any new document, glossary term, or query ingested into the system is now simultaneously embedded using both the legacy V1 model and the new Gemini Embedding V2 model. Both vectors are written to their respective columns in the database.
3. Backfilling¶
With new data handled by Dual Write, we run background asynchronous scripts to backfill the embedding_v2 column for all historical data. This process is rate-limited to ensure it does not impact the performance of live user queries.
4. Shadow Validation (Shadow Search)¶
The most critical step is validation. Before switching the primary read path to V2, we implemented Shadow Search. When a user performs a search, the system queries the V1 index to return results to the user as normal. However, asynchronously in the background, it also queries the V2 index.
We then compare the results (calculating the overlap percentage). This provides us with empirical, data-driven evidence of the new model's performance on real-world queries. Only when the Shadow Validation proves that V2 meets or exceeds our quality thresholds do we authorize the final cutover.
5. Cutover¶
The final cutover is a simple configuration flag flip that redirects the read path from the V1 column to the V2 column. Because the V2 column is already fully populated and validated, the transition is instant and seamless for the end-user.