ADR 0001: Structured JSON Output for SQL Generation

Status

Accepted

Context

Jorvis generates SQL via LLM and executes it after backend validation (SqlGuard/policy). The current "return only SQL string" interface is fragile: the model might add explanations/markdown, and we spend complexity on post-parsing.

Gemini API supports structured outputs: response_mime_type=application/json + response_json_schema, which guarantees syntactically valid JSON (but does not guarantee semantic correctness).

Decision

Introduce "SQL generation result" contract as a JSON object, not a plain string.

Canonical contract (provider-agnostic)

Minimal result schema:

  • sql: string (required)
  • target_dialect: string (enum/allowlist; sourced from db.profiles[].driver)
  • assumptions: string[] (optional)
  • needs_followup: boolean (optional)
  • confidence: number (0..1, optional)

Non-negotiables

  • Backend validator + policy remain the source of truth for SELECT-only and other guardrails.
  • Structured output is not a security gate.
  • No raw SQL/PII in PR/CI logs; artifacts must be sanitized.

Implementation sketch (v1-compatible)

  1. Gemini path: add response_mime_type=application/json and response_json_schema to generation payload.
  2. Parsing: JSON.parse → schema check (minimal: sql presence + types).
  3. Validation: run SqlGuard on sql (and sidecar as optional defense-in-depth if needed), then execute.
  4. Telemetry/Eval: log only fingerprints/metadata (SQL hash), not the SQL itself.

Alternatives considered

  • A1: Keep plain SQL + regex/heuristics post-parsing (drift and fragility).
  • A2: function-calling/tool-calls as the only protocol (more complex, requires broader contract).

Risks

  • "JSON correct, semantics wrong": backend checks + execution-based eval runner required.
  • Drift between providers: contract must be provider-agnostic; structured output is an implementation detail.

DoD / Evidence

  • Execution-based eval runner shows improvement in format-failure rate (to ~0).
  • Negative tests: model returns non-JSON/empty JSON → correct error without SQL leak.
  • Release-gate PASS.