Federation Module Architecture

Namespace: src/data/federation/ Status: ✅ Production Ready (Task-067) Key Service: FederatedQueryService

The Federation module enables "Chat with any Data" by allowing Jorvis to execute queries across multiple disparate data sources (e.g., PostgreSQL + Google Sheets + Snowflake) and merge the results in-memory.


🏗️ Core Architecture

The federation engine follows a Plan-Execute-Merge pipeline:

  1. Query Decomposition: A complex question is broken down into a FederatedQueryPlan.
  2. Topological Sort: Sub-queries are ordered based on dependencies.
  3. Parallel Execution: Independent sub-queries are executed concurrently.
  4. InMemory Merge: Results are combined using a specified strategy.
graph TD
    A[FederatedQueryPlan] --> B{Topological Sort}
    B --> C[Batch 1: Independent Queries]
    C --> D[Batch 2: Dependent Queries]
    D --> E[Merge Strategy]
    E --> F[FederatedResult]

1. FederatedQueryService

The main orchestrator that manages the execution lifecycle.

  • Concurrency: Controlled by JORVIS_FEDERATION_MAX_PARALLEL (default: 5).
  • Timeout: Fails fast if sources are unresponsive (JORVIS_FEDERATION_TIMEOUT_MS).
  • Resilience: Handles partial failures if configured.

2. Execution Pipeline

  • Logic: execute(plan, executor, options)
  • Timeout Guard: Each sub-query is wrapped in a promise race with a timeout.
  • Dependency Resolution: Using a topological sort algorithm to ensure sub-queries with upstream dependencies wait for data.

🔄 Merge Strategies

The module supports 4 strategies for combining data from different sources:

StrategyDescriptionUse Case
UNIONCombines rows from all sources, removing duplicates.Merging "Sales" tables from US and EU databases.
CONCATAppends all rows, preserving duplicates.Logging or raw data aggregation.
JOINNested-loop join on common keys.Enriching "Orders" (SQL) with "Customer Details" (CRM/API).
AGGREGATESums numeric values across datasets.Total revenue across different payment gateways.

⚙️ Configuration

Environment VariableDefaultDescription
JORVIS_FEDERATION_ENABLEDfalseMaster toggle for the module.
JORVIS_FEDERATION_MAX_PARALLEL5Max concurrent DB connections.
JORVIS_FEDERATION_TIMEOUT_MS30000Global timeout for the entire plan.

🧩 Data Structures

FederatedQueryPlan

interface FederatedQueryPlan {
  id: string;
  subQueries: SubQuery[];
  mergeStrategy: MergeStrategy;
}

SubQuery

interface SubQuery {
  id: string;
  connectionId: string; // Target Data Source
  sql: string;
  dependencies?: string[]; // IDs of queries that must finish first
}