ADR 0003: SchemaCard Metadata Caching

Status

Accepted

Context

We need schema context to improve Text2SQL accuracy without querying user data. Current approaches are either slow or unsafe.

Decision

Introduce a SchemaCard built only from system catalogs (information_schema / pg_catalog). No user table scans.

Constraints & Safety

  1. Metadata only (tables/columns/types); no SELECT from user tables.
  2. System catalogs only (ANSI information_schema + dialect catalogs).
  3. Limits:
    • Max tables: 100
    • Max columns (total): 500
    • Max JSON size: 256KB
  4. TTL cache: 5 minutes (configurable).
  5. Public logs must not include table/column names. Only summary counts + latency. Full SchemaCard may exist only in gitignored artifacts (opt‑in).

Feature Flag

  • JORVIS_SCHEMA_CONTEXT_MODE=schemacard (default: none)

Implementation Sketch

  • SchemaCardService builds + caches.
  • DialectAdapter per DB (Postgres/Redshift first).

Alternatives

  • Semantic scan: rejected (PII/perf risk).
  • Hardcoded schema: not scalable.

DoD / Evidence

  • Works on AdventureWorks (Postgres).
  • Summary‑only logs in PR/CI (no object names).
  • CI PASS (docs‑only).