ADR 0003: SchemaCard Metadata Caching
ADR 0003: SchemaCard Metadata Caching
Status
Accepted
Context
We need schema context to improve Text2SQL accuracy without querying user data. Current approaches are either slow or unsafe.
Decision
Introduce a SchemaCard built only from system catalogs (information_schema / pg_catalog). No user table scans.
Constraints & Safety
- Metadata only (tables/columns/types); no SELECT from user tables.
- System catalogs only (ANSI information_schema + dialect catalogs).
- Limits:
- Max tables: 100
- Max columns (total): 500
- Max JSON size: 256KB
- TTL cache: 5 minutes (configurable).
- Public logs must not include table/column names. Only summary counts + latency. Full SchemaCard may exist only in gitignored artifacts (opt‑in).
Feature Flag
- JORVIS_SCHEMA_CONTEXT_MODE=schemacard (default: none)
Implementation Sketch
- SchemaCardService builds + caches.
- DialectAdapter per DB (Postgres/Redshift first).
Alternatives
- Semantic scan: rejected (PII/perf risk).
- Hardcoded schema: not scalable.
DoD / Evidence
- Works on AdventureWorks (Postgres).
- Summary‑only logs in PR/CI (no object names).
- CI PASS (docs‑only).