ADR 0001: Structured JSON Output for SQL Generation
ADR 0001: Structured JSON Output for SQL Generation
Status
Accepted
Context
Jorvis generates SQL via LLM and executes it after backend validation (SqlGuard/policy). The current "return only SQL string" interface is fragile: the model might add explanations/markdown, and we spend complexity on post-parsing.
Gemini API supports structured outputs: response_mime_type=application/json + response_json_schema, which guarantees syntactically valid JSON (but does not guarantee semantic correctness).
Decision
Introduce "SQL generation result" contract as a JSON object, not a plain string.
Canonical contract (provider-agnostic)
Minimal result schema:
sql: string (required)target_dialect: string (enum/allowlist; sourced fromdb.profiles[].driver)assumptions: string[] (optional)needs_followup: boolean (optional)confidence: number (0..1, optional)
Non-negotiables
- Backend validator + policy remain the source of truth for SELECT-only and other guardrails.
- Structured output is not a security gate.
- No raw SQL/PII in PR/CI logs; artifacts must be sanitized.
Implementation sketch (v1-compatible)
- Gemini path: add
response_mime_type=application/jsonandresponse_json_schemato generation payload. - Parsing: JSON.parse → schema check (minimal:
sqlpresence + types). - Validation: run
SqlGuardonsql(and sidecar as optional defense-in-depth if needed), then execute. - Telemetry/Eval: log only fingerprints/metadata (SQL hash), not the SQL itself.
Alternatives considered
- A1: Keep plain SQL + regex/heuristics post-parsing (drift and fragility).
- A2: function-calling/tool-calls as the only protocol (more complex, requires broader contract).
Risks
- "JSON correct, semantics wrong": backend checks + execution-based eval runner required.
- Drift between providers: contract must be provider-agnostic; structured output is an implementation detail.
DoD / Evidence
- Execution-based eval runner shows improvement in format-failure rate (to ~0).
- Negative tests: model returns non-JSON/empty JSON → correct error without SQL leak.
- Release-gate PASS.