Architecture Deep Dive

System: Jorvis Task Automation Version: 1.0.0 Date: 2026-01-19


Design Principles

1. Separation of Concerns

Architect (Claude Opus):

  • Strategic thinking
  • Design decisions (ADR creation)
  • Code review
  • Error classification
  • 2-3 calls per task

Executor (Gemini):

  • Tactical implementation
  • Code generation
  • Fast iteration
  • 3-5 calls per task

Quality Gate (GitHub Actions - Local):

  • Deterministic validation
  • 100% enforcement
  • No LLM involved

2. Fail-Fast Philosophy

Error detected → Classify immediately → Route to appropriate handler

Trivial error → Auto-fix → Re-validate (0 iterations wasted)
Tactical error → Executor retry with feedback
Strategic error → Architect redesign

3. Defense in Depth

5 Layers of Quality Enforcement:

  1. Documentation — CLAUDE.md, AGENTS.md, style guides
  2. Pre-commit hooks — Gitleaks, sanitization
  3. Antigravity roles.antigravity/rules (soft enforcement)
  4. Quality Gate — HARD BLOCK (lint, build, test, coverage non-regression)
  5. Branch Protection — GitHub Pro settings (PR approval required)

Automation relies on Layers 4-5 (100% guaranteed enforcement).


State Machine

State Definition

interface TaskState {
    task_id: string;                    // "Task-104"
    task_description: string;           // From TASK_BOARD.md
    adr: string | null;                 // Architecture Decision Record
    code_files: string[];               // Generated file paths
    validation_errors: ValidationError[]; // From Quality Gate
    conversation_history: Message[];    // Full context
    review_feedback: string | null;     // Architect feedback
    status: Status;                     // Current state
    iteration: number;                  // Current iteration
    max_iterations: number;             // Adaptive limit (3-7)
    warnings: string[];                 // ADR validation warnings
}

type Status = "design" | "implementation" | "review" | "approved" | "failed";

State Transitions

stateDiagram-v2
    [*] --> design

    design --> implementation: ADR created

    implementation --> quality_gate: Code generated

    quality_gate --> approved: All checks pass
    quality_gate --> review: Validation errors

    review --> implementation: Tactical errors (executor retry)
    review --> design: Strategic errors (architect redesign)
    review --> failed: Max iterations reached

    approved --> [*]
    failed --> [*]

Transition Logic

# From quality_gate
if no errors:
    state["status"] = "approved"
    return state  # → END
else:
    state["status"] = "review"
    return state  # → architect_review

# From architect_review
errors = classify_errors(state["validation_errors"])

if errors["trivial"] only:
    auto_fix()
    return run_quality_gate()  # Re-validate immediately

elif errors["tactical"]:
    state["status"] = "implementation"
    return state  # → executor_implement

elif errors["strategic"]:
    state["status"] = "design"
    return state  # → architect_design

if iteration >= max_iterations:
    state["status"] = "failed"
    return state  # → END (human escalation)

Error Classification Algorithm

Classification Matrix

Error PatternCategoryHandlerIterations Cost
cannot find name 'X'TrivialAuto-fix import0
'X' is declared but never usedTrivialESLint disable0
Type 'X' not assignable to 'Y'TacticalExecutor retry1
Test 'X' failedTacticalExecutor retry1
Coverage 78% < 80%StrategicArchitect redesign2
Property 'foo' does not exist (hallucination)HallucinationExecutor retry with warning1

Hallucination Detection

Gemini specific patterns:

hallucination_patterns = [
    # Invented methods
    r"property ['\"](\w+)['\"] does not exist on type",

    # Non-existent modules
    r"cannot find module ['\"]@?(\w+)",

    # Wrong framework (Jorvis is NestJS, not React/Vue/Angular)
    r"module ['\"]react['\"]",
    r"module ['\"]vue['\"]",
    r"module ['\"]angular['\"]",

    # Invented services (pattern: ends with Service/Controller/Module)
    r"cannot find name ['\"]\w+(Service|Controller|Module)['\"]",
]

Why separate classification:

  • Hallucinations need explicit feedback: "This class doesn't exist in Jorvis codebase"
  • Higher retry priority (Gemini can self-correct with clear feedback)
  • Metrics tracking (hallucination rate = quality indicator)

Auto-Fix System

Dynamic Import Detection

Problem: Hardcoded import paths break on refactoring.

Solution: Grep-based symbol lookup.

def find_symbol_definition(name: str) -> Optional[str]:
    """Find file where symbol is exported"""

    patterns = [
        f"export class {name}",
        f"export interface {name}",
        f"export type {name}",
        f"export const {name}",
    ]

    for pattern in patterns:
        result = subprocess.run(
            ["grep", "-rl", pattern, "analytics-platform/src"],
            capture_output=True,
            text=True
        )

        if result.returncode == 0:
            return result.stdout.strip().split('\n')[0]  # First match

    return None

Relative path calculation:

# Example:
from_file = "analytics-platform/src/question/question.controller.ts"
to_file = "analytics-platform/src/ai/graph/graph.types.ts"

# Result: "../../ai/graph/graph.types"

Safety Guards

Whitelist approach:

# Only auto-fix known safe symbols
safe_imports = {
    # NestJS core (always safe)
    "Injectable", "Controller", "Get", "Post",

    # Jorvis services (verified to exist)
    "GraphOrchestratorService", "SchemaContextService",

    # Jorvis types (verified to exist)
    "GraphState", "ConversationQuestionDto",
}

# External packages, unknown symbols → SKIP auto-fix

Why whitelist over blacklist:

  • False positive auto-fix is worse than skipping
  • Gemini hallucinations can invent plausible-sounding names
  • Better to require 1 manual iteration than inject wrong import

ESLint Disable (Safer than Renaming)

Old approach (risky):

# Rename: const foo = ... → const _foo = ...
# Problem: Breaks if "foo" is used elsewhere in file

New approach (safe):

# Add comment above declaration
// eslint-disable-next-line @typescript-eslint/no-unused-vars
const foo = ...  // Still unused, but lint passes

Why safer:

  • No code modification (only comments)
  • Explicit intent (human reviewer knows variable unused)
  • Reversible (remove comment if variable later used)

ADR Validation Strategy

Inline Validation (No Double LLM Call)

Old approach:

  1. Architect creates ADR
  2. Separate LLM call validates against invariants
  3. If violations → regenerate ADR

Problem: 2 LLM calls per iteration = slow + expensive

New approach:

  1. Architect creates ADR with inline invariant compliance section
  2. Lightweight regex check for missing sections
  3. If compliance section present → trust LLM

Prompt structure:

Create an ADR that:
...
5. EXPLICITLY states: "This ADR does NOT violate any invariants"
6. JUSTIFIES any new dependencies

Format:
...
### Invariant Compliance
- Anti-loop: ✅ Does not call Open WebUI
- SQL safety: ✅ Uses SqlGuard
...

Post-validation (quick check):

def quick_validate_adr(adr: str) -> dict:
    warnings = []

    if "Invariant Compliance" not in adr:
        warnings.append("ADR missing compliance section")

    # Still do keyword check for anti-loop (critical invariant)
    if "openwebui.api" in adr.lower():
        warnings.append("May mention Open WebUI API (review manually)")

    return {"warnings": warnings}

Benefits:

  • 1 LLM call instead of 2
  • Architect forced to think about invariants upfront
  • Compliance section visible in ADR for human review

Git Branch Isolation

Why Sequential Execution

Parallel execution problems:

# Task 1 (branch: task/task-104)
git checkout -b task/task-104
npm run build  # Modifies node_modules/.cache

# Task 2 (branch: task/task-105, parallel)
git checkout -b task/task-105
npm run build  # Conflicts with Task 1's cache!

# Result: Race condition, unpredictable failures

Sequential execution benefits:

# Task 1
git checkout main
git checkout -b task/task-104
# ... work on Task 1 ...
git checkout main  # Clean state

# Task 2
git checkout main
git checkout -b task/task-105
# ... work on Task 2 (no conflicts)

Branch Lifecycle

def run_single_task_isolated(task_id: str) -> dict:
    # 1. Start from clean main
    subprocess.run(["git", "checkout", "main"])
    subprocess.run(["git", "pull", "origin", "main"])

    # 2. Create task branch
    branch_name = f"task/{task_id.lower()}"
    subprocess.run(["git", "checkout", "-b", branch_name])

    # 3. Run workflow (modifies files)
    result = run_workflow(task_id)

    # 4. Return to main
    subprocess.run(["git", "checkout", "main"])

    # 5. Cleanup (if successful)
    if result["success"]:
        subprocess.run(["git", "branch", "-D", branch_name])
    else:
        # Keep branch for debugging
        print(f"⚠️  Keeping branch {branch_name} for debugging")

    return result

API Retry Strategy

Exponential Backoff

def call_llm_with_retry(client, messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.messages.create(messages=messages)

        except Exception as e:
            if "rate" in str(e).lower() and attempt < max_retries - 1:
                wait_time = (2 ** attempt) * 2  # 2s, 4s, 8s
                print(f"⚠️  Rate limit, retrying in {wait_time}s...")
                time.sleep(wait_time)
            else:
                raise  # Give up or non-retryable error

Retry scenarios:

Error TypeRetry?Backoff
Rate limit (429)✅ YesExponential
Timeout (503)✅ YesExponential
Invalid request (400)❌ No
Auth error (401)❌ No

Iteration Limits

Adaptive Strategy

def determine_max_iterations(task_description: str) -> int:
    score = 0

    # New feature: +3
    if any(word in task for word in ["new", "add", "implement"]):
        score += 3

    # Bug fix: +1
    elif any(word in task for word in ["fix", "bug"]):
        score += 1

    # Integration: +2
    if "integration" in task or "api" in task:
        score += 2

    # Database: +2
    if "database" in task or "schema" in task:
        score += 2

    # Scoring → Limit
    if score <= 3:
        return 3  # Simple (e.g., "Fix typo in README")
    elif score <= 6:
        return 5  # Moderate (e.g., "Add validation rule")
    else:
        return 7  # Complex (e.g., "Implement MTIR-SQL")

Rationale:

  • Simple tasks shouldn't burn 5 iterations on typo fixes
  • Complex tasks need room for multi-stage debugging
  • Prevents infinite loops (hard cap at 7)

Performance Optimizations

1. Inline Validation

  • Saved: 1 LLM call per iteration
  • Impact: 20-40s per iteration

2. Auto-Fix Trivial Errors

  • Saved: 1 iteration for 20% of errors
  • Impact: 2-3 minutes per auto-fixed error

3. Sequential (Not Parallel) Quality Gate

  • Why not parallel: Complexity > benefit
  • Current: 60-70s for all checks
  • Parallel theoretical: 30-40s (but race conditions risk)

Decision: Keep sequential for robustness.


Security Architecture

Input Validation

# Task ID validation
assert re.match(r'^Task-\d+$', task_id), "Invalid task ID format"

# File path validation (prevent directory traversal)
assert not '..' in file_path, "Directory traversal blocked"
assert file_path.startswith('analytics-platform/'), "Invalid base path"

Output Sanitization

  • All LLM outputs validated by Quality Gate before commit
  • No eval(), exec(), or dynamic code execution
  • File writes use explicit paths (no user input in paths)

Secrets Protection

  • API keys in environment variables only
  • Quality Gate includes secrets check (regex patterns for keys)
  • Git hooks prevent accidental commits (gitleaks)

Monitoring & Observability

Metrics Collection

# Per-task metrics
{
    "task_id": "Task-104",
    "success": true,
    "duration_sec": 156.3,
    "iterations": 2,
    "errors": [
        {"iteration": 1, "type": "tactical", "error": "Type mismatch..."}
    ],
    "auto_fixes": 1,
    "hallucinations": 0
}

# Aggregate metrics
{
    "success_rate": 70.0,
    "avg_iterations": 2.4,
    "avg_duration_sec": 180.5,
    "hallucination_rate": 8.0,
    "auto_fix_success_rate": 95.0
}

Key Performance Indicators (KPIs)

KPITargetRed Flag
Success rate≥70%<50%
Avg iterations≤3>5
Hallucination rate≤10%>15%
Auto-fix success≥90%<80%

Scalability Considerations

Current Limitations

  • Sequential execution: 1 task at a time (~2h per task)
  • Single repo: Jorvis only
  • Manual pilot: Human reviews pilot_report.json

Future Scaling Strategies

Phase 2 (10+ tasks/week):

  • Git worktrees for parallel execution (1 worktree per task)
  • Task queue with priority (P0 > P1 > P2)
  • Automated rollback on 3+ consecutive failures

Phase 3 (100+ tasks/week):

  • Multi-repo support (shared workflow, repo-specific config)
  • Distributed execution (multiple machines)
  • ML-based prompt optimization (learn from failures)

Edge Cases & Limitations

Known Limitations

  1. Multi-file refactoring: Limited to ~5 files per task

    • Workaround: Break into smaller tasks
  2. Breaking API changes: Requires manual coordination

    • Workaround: Human-designed ADR, automated implementation only
  3. Non-deterministic tests: Flaky tests cause false failures

    • Workaround: Fix flaky tests first, then automate
  4. External dependencies: npm package updates not automated

    • Workaround: Manual dependency updates, automation for code changes only

Edge Case Handling

Scenario: Gemini generates valid code but uses deprecated API

Current behavior: Quality Gate passes (code compiles), deployed to prod

Solution: Add deprecation linter rule to Quality Gate

Scenario: Task requires human decision (e.g., "Choose color scheme")

Current behavior: Architect makes arbitrary choice, may not match user preference

Solution: Add AskUserQuestion node for ambiguous requirements (future enhancement)


Comparison with Alternatives

vs. GitHub Copilot

FeatureJorvis AutomationGitHub Copilot
ScopeFull task (design → implementation → tests)Single function/file
ValidationQuality Gate enforcedManual review
ContextFull repo + HANDOFFCurrent file
Cost$0.30-0.70/task$10-20/month flat

Use case: Copilot for interactive coding, Jorvis for full task automation.

vs. Devin AI

FeatureJorvis AutomationDevin AI
ArchitectureOpen source (LangGraph)Proprietary
CostPay-per-useSubscription ($500/month)
CustomizationFull controlLimited
Jorvis-specificYes (CLAUDE.md, invariants)Generic

Use case: Devin for general-purpose, Jorvis automation for domain-specific (analytics-platform).


Last Updated: 2026-01-19 Next Review: After pilot results (Feb 2026)