Data Layer Architecture
Data Layer Architecture
Namespace: src/data/
Status: ✅ Production Ready
The Data Layer provides a unified, resilient abstraction for interacting with multiple database backends (Redshift, BigQuery, Snowflake, PostgreSQL). It adheres to the "Ports & Adapters" architecture, isolating business logic from specific database drivers.
🏗️ Core Components
1. DbFactory (db-factory.service.ts)
The DbFactoryService is the primary entry point for executing SQL queries. It manages connection lifecycles, retries, and failure handling.
Key Features:
- Provider Agnostic: Unified
execute(request)interface for all providers. - Resilience: Implements Circuit Breaker pattern to prevent cascading failures.
- Threshold: 5 failures
- Cooldown: 30s
- Backoff: Exponential jitter
- Connection Pooling: Manages Knex pools for Redshift/Postgres.
- Observability: Integrated with
QueryObservabilityServicefor timeout and error tracking. - Security: Validation of connection profiles via
SecretsService.
2. SQL Validation (sql-validation.service.ts)
Ensures all executed queries are safe and compatible with the target dialect.
Safety Checks:
- Read-Only: Enforces
SELECTonly. BlocksDROP,DELETE,INSERT,UPDATE,ALTER. - Schema Validation: Verifies table and column existence against
BrandSchema. - Redshift Compatibility:
- Blocks
::casting (enforcesCAST()). - Blocks
FILTERclauses (enforcesCASE WHEN). - Blocks array operations.
- Blocks
- Quoting: Enforces quoting for columns with spaces or special characters.
3. Dynamic Schema (dynamic-schema.service.ts)
Manages the discovery and caching of database schemas to support the semantic layer.
- Discovery: Fetches
information_schemametadata. - Caching: Caches schema definitions to reduce overhead.
- Enrichment: Adds business descriptions from
SchemaCardService.
🔌 Supported Dialects
Located in src/data/dialects/.
| Dialect | Status | Notes |
|---|---|---|
| Redshift | ✅ Primary | Treated as PostgreSQL-compatible but with strict validation rules. |
| BigQuery | ✅ Active | Uses Google Cloud BigQuery SDK. Supports project/location config. |
| Snowflake | ✅ Active | Uses snowflake-sdk. Supports key-pair auth. |
| PostgreSQL | ⚠️ Local | Used mostly for local testing/demo (AdventureWorks). |
🛡️ Security & Isolation
- Multi-Tenancy: The Data Layer respects tenant isolation provided by
SecretsService. - Credential Management: No credentials are stored in code; strictly loaded via
SecretsService(Env/GSM). - Query Sanitization: All inputs are validated before execution.
🚦 Error Handling
The layer normalizes errors from different drivers into a standard format:
- Timeout: Automatically tracked and reported. Use
DB_QUERY_TIMEOUT_MS. - Connection Error: Triggers immediate retry (up to
DB_FACTORY_MAX_RETRIES). - Circuit Open: Fails fast if the upstream database is unhealthy.