Data Layer Architecture

Namespace: src/data/ Status: ✅ Production Ready

The Data Layer provides a unified, resilient abstraction for interacting with multiple database backends (Redshift, BigQuery, Snowflake, PostgreSQL). It adheres to the "Ports & Adapters" architecture, isolating business logic from specific database drivers.


🏗️ Core Components

1. DbFactory (db-factory.service.ts)

The DbFactoryService is the primary entry point for executing SQL queries. It manages connection lifecycles, retries, and failure handling.

Key Features:

  • Provider Agnostic: Unified execute(request) interface for all providers.
  • Resilience: Implements Circuit Breaker pattern to prevent cascading failures.
    • Threshold: 5 failures
    • Cooldown: 30s
    • Backoff: Exponential jitter
  • Connection Pooling: Manages Knex pools for Redshift/Postgres.
  • Observability: Integrated with QueryObservabilityService for timeout and error tracking.
  • Security: Validation of connection profiles via SecretsService.

2. SQL Validation (sql-validation.service.ts)

Ensures all executed queries are safe and compatible with the target dialect.

Safety Checks:

  • Read-Only: Enforces SELECT only. Blocks DROP, DELETE, INSERT, UPDATE, ALTER.
  • Schema Validation: Verifies table and column existence against BrandSchema.
  • Redshift Compatibility:
    • Blocks :: casting (enforces CAST()).
    • Blocks FILTER clauses (enforces CASE WHEN).
    • Blocks array operations.
  • Quoting: Enforces quoting for columns with spaces or special characters.

3. Dynamic Schema (dynamic-schema.service.ts)

Manages the discovery and caching of database schemas to support the semantic layer.

  • Discovery: Fetches information_schema metadata.
  • Caching: Caches schema definitions to reduce overhead.
  • Enrichment: Adds business descriptions from SchemaCardService.

🔌 Supported Dialects

Located in src/data/dialects/.

DialectStatusNotes
Redshift✅ PrimaryTreated as PostgreSQL-compatible but with strict validation rules.
BigQuery✅ ActiveUses Google Cloud BigQuery SDK. Supports project/location config.
Snowflake✅ ActiveUses snowflake-sdk. Supports key-pair auth.
PostgreSQL⚠️ LocalUsed mostly for local testing/demo (AdventureWorks).

🛡️ Security & Isolation

  • Multi-Tenancy: The Data Layer respects tenant isolation provided by SecretsService.
  • Credential Management: No credentials are stored in code; strictly loaded via SecretsService (Env/GSM).
  • Query Sanitization: All inputs are validated before execution.

🚦 Error Handling

The layer normalizes errors from different drivers into a standard format:

  • Timeout: Automatically tracked and reported. Use DB_QUERY_TIMEOUT_MS.
  • Connection Error: Triggers immediate retry (up to DB_FACTORY_MAX_RETRIES).
  • Circuit Open: Fails fast if the upstream database is unhealthy.