ADR-0011: Graph Architecture (GraphRAG)
ADR-0011: Graph Architecture (GraphRAG)
Status: DEPRECATED (Superseded by ADR-0026) Date: 2026-01-06 Author: Antigravity Task: Task-020
Context
Jorvis currently relies on a Hybrid Retriever (Semantic + Keyword) to find relevant tables for user queries. While effective for simple questions, this approach struggles with:
- Multi-hop Reasoning: Queries like "Revenue per Customer in North Region" require traversing
Sales -> Customer -> Region. Vector search often misses the intermediateCustomertable if it doesn't semantic match the query terms heavily. - Ambiguity: Distinguishing between "Product" (Entity) and "Product Category" (Dimension) is difficult without structural context.
- Hallucination Risks: LLMs generate invalid JOINs if they guess relationships not present in the schema.
Decision
We will implement a Knowledge Graph using Apache AGE (A Graph Extension for PostgreSQL).
1. Technology Choice: Apache AGE
- Why: It allows us to keep the graph data inside Postgres (or strictly adjacent), leveraging existing backups, tooling, and SQL skills. It supports OpenCypher, the industry standard for graph queries.
- Alternatives Considered:
- Neo4j: Powerful but requires a heavy JVM sidecar and separate licensing/infra.
- NetworkX (In-Memory): Too slow for large schemas and lacks persistence.
- Recursive SQL: Too complex to maintain for flexible pathfinding.
2. Graph Schema Design
The Knowledge Graph will consist of the following Nodes and Edges:
Nodes:
(:Table {name: 'schema.table', description: '...'})(:Column {name: 'col_name', type: 'varchar'})(:Metric {name: 'Revenue', definition: 'SUM(amount)'})(Future)(:BusinessConcept {name: 'Churn'})(Future)
Edges:
(:Table)-[:HAS_COLUMN]->(:Column)(:Column)-[:FOREIGN_KEY_TO]->(:Column)(Structural Link)(:Table)-[:SEMANTICS]->(:BusinessConcept)(Glossary Link)
3. Integration Strategy (Post-Retrieval)
GraphRAG will be applied Post-Retrieval (Reranking/Expansion phase):
- Retrieve: Hybrid Retriever gets Top-K tables (e.g., 5).
- Expand: Query the Graph to find immediate neighbors (tables joined by FK) of the Top-K.
- Prune: Use an LLM or heuristic to keep only relevant neighbors (e.g., Customer connects Sales and Region).
- Context: Feed the subgraph schema to the SQL generation prompt.
4. Data Sync
- Hydration: A script (
schema_to_graph.py) will parseINFORMATION_SCHEMAandGlossaryto rebuild the graph periodically (e.g., on deployment or schema change).
Consequences
- Positive:
- Explicit modeling of JOIN paths reduces hallucinated SQL.
- Can answer "How are X and Y related?" deterministically.
- Negative:
- Introduces a dependency on the
ageextension (requires compatible Postgres version, currently PG16/17). - Requires a new sync pipeline (
schema_to_graph.py).
- Introduces a dependency on the
Implementation Plan
- PoC (Done): Verified AGE (PG17) and metadata ingestion in
scripts/research/graphrag-poc/. - Infra (Task-021): Add AGE container to global
docker-compose.yml(or enable extension in main DB if supported). - Service (Task-021): Create
GraphServiceinanalytics-platformto wrap Cypher queries. - FK Extraction (Task-021): Implement FOREIGN_KEY edge creation from
information_schema.key_column_usage.
Approval
- Approved by: Claude-code (Architect)
- Date: 2026-01-06
- Notes: Research phase complete. Implementation deferred to Task-021.