Architecture Deep Dive

System: Jorvis Task Automation Version: 1.0.0 Date: 2026-01-19

Design Principles

1. Separation of Concerns

Architect (Claude Opus):

Strategic thinking
Design decisions (ADR creation)
Code review
Error classification
2-3 calls per task

Executor (Gemini):

Tactical implementation
Code generation
Fast iteration
3-5 calls per task

Quality Gate (GitHub Actions - Local):

Deterministic validation
100% enforcement
No LLM involved

2. Fail-Fast Philosophy

Error detected → Classify immediately → Route to appropriate handler

Trivial error → Auto-fix → Re-validate (0 iterations wasted)
Tactical error → Executor retry with feedback
Strategic error → Architect redesign

3. Defense in Depth

5 Layers of Quality Enforcement:

Documentation — CLAUDE.md, AGENTS.md, style guides
Pre-commit hooks — Gitleaks, sanitization
Antigravity roles — .antigravity/rules (soft enforcement)
Quality Gate — HARD BLOCK (lint, build, test, coverage non-regression)
Branch Protection — GitHub Pro settings (PR approval required)

Automation relies on Layers 4-5 (100% guaranteed enforcement).

State Machine

State Definition

interface TaskState {
    task_id: string;                    // "Task-104"
    task_description: string;           // From TASK_BOARD.md
    adr: string | null;                 // Architecture Decision Record
    code_files: string[];               // Generated file paths
    validation_errors: ValidationError[]; // From Quality Gate
    conversation_history: Message[];    // Full context
    review_feedback: string | null;     // Architect feedback
    status: Status;                     // Current state
    iteration: number;                  // Current iteration
    max_iterations: number;             // Adaptive limit (3-7)
    warnings: string[];                 // ADR validation warnings
}

type Status = "design" | "implementation" | "review" | "approved" | "failed";

State Transitions

stateDiagram-v2
    [*] --> design

    design --> implementation: ADR created

    implementation --> quality_gate: Code generated

    quality_gate --> approved: All checks pass
    quality_gate --> review: Validation errors

    review --> implementation: Tactical errors (executor retry)
    review --> design: Strategic errors (architect redesign)
    review --> failed: Max iterations reached

    approved --> [*]
    failed --> [*]

Transition Logic

# From quality_gate
if no errors:
    state["status"] = "approved"
    return state  # → END
else:
    state["status"] = "review"
    return state  # → architect_review

# From architect_review
errors = classify_errors(state["validation_errors"])

if errors["trivial"] only:
    auto_fix()
    return run_quality_gate()  # Re-validate immediately

elif errors["tactical"]:
    state["status"] = "implementation"
    return state  # → executor_implement

elif errors["strategic"]:
    state["status"] = "design"
    return state  # → architect_design

if iteration >= max_iterations:
    state["status"] = "failed"
    return state  # → END (human escalation)

Error Classification Algorithm

Classification Matrix

Error Pattern	Category	Handler	Iterations Cost
`cannot find name 'X'`	Trivial	Auto-fix import	0
`'X' is declared but never used`	Trivial	ESLint disable	0
`Type 'X' not assignable to 'Y'`	Tactical	Executor retry	1
`Test 'X' failed`	Tactical	Executor retry	1
`Coverage 78% < 80%`	Strategic	Architect redesign	2
`Property 'foo' does not exist` (hallucination)	Hallucination	Executor retry with warning	1

Hallucination Detection

Gemini specific patterns:

hallucination_patterns = [
    # Invented methods
    r"property ['\"](\w+)['\"] does not exist on type",

    # Non-existent modules
    r"cannot find module ['\"]@?(\w+)",

    # Wrong framework (Jorvis is NestJS, not React/Vue/Angular)
    r"module ['\"]react['\"]",
    r"module ['\"]vue['\"]",
    r"module ['\"]angular['\"]",

    # Invented services (pattern: ends with Service/Controller/Module)
    r"cannot find name ['\"]\w+(Service|Controller|Module)['\"]",
]

Why separate classification:

Hallucinations need explicit feedback: "This class doesn't exist in Jorvis codebase"
Higher retry priority (Gemini can self-correct with clear feedback)
Metrics tracking (hallucination rate = quality indicator)

Auto-Fix System

Dynamic Import Detection

Problem: Hardcoded import paths break on refactoring.

Solution: Grep-based symbol lookup.

def find_symbol_definition(name: str) -> Optional[str]:
    """Find file where symbol is exported"""

    patterns = [
        f"export class {name}",
        f"export interface {name}",
        f"export type {name}",
        f"export const {name}",
    ]

    for pattern in patterns:
        result = subprocess.run(
            ["grep", "-rl", pattern, "analytics-platform/src"],
            capture_output=True,
            text=True
        )

        if result.returncode == 0:
            return result.stdout.strip().split('\n')[0]  # First match

    return None

Relative path calculation:

# Example:
from_file = "analytics-platform/src/question/question.controller.ts"
to_file = "analytics-platform/src/ai/graph/graph.types.ts"

# Result: "../../ai/graph/graph.types"

Safety Guards

Whitelist approach:

# Only auto-fix known safe symbols
safe_imports = {
    # NestJS core (always safe)
    "Injectable", "Controller", "Get", "Post",

    # Jorvis services (verified to exist)
    "GraphOrchestratorService", "SchemaContextService",

    # Jorvis types (verified to exist)
    "GraphState", "ConversationQuestionDto",
}

# External packages, unknown symbols → SKIP auto-fix

Why whitelist over blacklist:

False positive auto-fix is worse than skipping
Gemini hallucinations can invent plausible-sounding names
Better to require 1 manual iteration than inject wrong import

ESLint Disable (Safer than Renaming)

Old approach (risky):

# Rename: const foo = ... → const _foo = ...
# Problem: Breaks if "foo" is used elsewhere in file

New approach (safe):

# Add comment above declaration
// eslint-disable-next-line @typescript-eslint/no-unused-vars
const foo = ...  // Still unused, but lint passes

Why safer:

No code modification (only comments)
Explicit intent (human reviewer knows variable unused)
Reversible (remove comment if variable later used)

ADR Validation Strategy

Inline Validation (No Double LLM Call)

Old approach:

Architect creates ADR
Separate LLM call validates against invariants
If violations → regenerate ADR

Problem: 2 LLM calls per iteration = slow + expensive

New approach:

Architect creates ADR with inline invariant compliance section
Lightweight regex check for missing sections
If compliance section present → trust LLM

Prompt structure:

Create an ADR that:
...
5. EXPLICITLY states: "This ADR does NOT violate any invariants"
6. JUSTIFIES any new dependencies

Format:
...
### Invariant Compliance
- Anti-loop: ✅ Does not call Open WebUI
- SQL safety: ✅ Uses SqlGuard
...

Post-validation (quick check):

def quick_validate_adr(adr: str) -> dict:
    warnings = []

    if "Invariant Compliance" not in adr:
        warnings.append("ADR missing compliance section")

    # Still do keyword check for anti-loop (critical invariant)
    if "openwebui.api" in adr.lower():
        warnings.append("May mention Open WebUI API (review manually)")

    return {"warnings": warnings}

Benefits:

1 LLM call instead of 2
Architect forced to think about invariants upfront
Compliance section visible in ADR for human review

Git Branch Isolation

Why Sequential Execution

Parallel execution problems:

# Task 1 (branch: task/task-104)
git checkout -b task/task-104
npm run build  # Modifies node_modules/.cache

# Task 2 (branch: task/task-105, parallel)
git checkout -b task/task-105
npm run build  # Conflicts with Task 1's cache!

# Result: Race condition, unpredictable failures

Sequential execution benefits:

# Task 1
git checkout main
git checkout -b task/task-104
# ... work on Task 1 ...
git checkout main  # Clean state

# Task 2
git checkout main
git checkout -b task/task-105
# ... work on Task 2 (no conflicts)

Branch Lifecycle

def run_single_task_isolated(task_id: str) -> dict:
    # 1. Start from clean main
    subprocess.run(["git", "checkout", "main"])
    subprocess.run(["git", "pull", "origin", "main"])

    # 2. Create task branch
    branch_name = f"task/{task_id.lower()}"
    subprocess.run(["git", "checkout", "-b", branch_name])

    # 3. Run workflow (modifies files)
    result = run_workflow(task_id)

    # 4. Return to main
    subprocess.run(["git", "checkout", "main"])

    # 5. Cleanup (if successful)
    if result["success"]:
        subprocess.run(["git", "branch", "-D", branch_name])
    else:
        # Keep branch for debugging
        print(f"⚠️  Keeping branch {branch_name} for debugging")

    return result

API Retry Strategy

Exponential Backoff

def call_llm_with_retry(client, messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.messages.create(messages=messages)

        except Exception as e:
            if "rate" in str(e).lower() and attempt < max_retries - 1:
                wait_time = (2 ** attempt) * 2  # 2s, 4s, 8s
                print(f"⚠️  Rate limit, retrying in {wait_time}s...")
                time.sleep(wait_time)
            else:
                raise  # Give up or non-retryable error

Retry scenarios:

Error Type	Retry?	Backoff
Rate limit (429)	✅ Yes	Exponential
Timeout (503)	✅ Yes	Exponential
Invalid request (400)	❌ No	—
Auth error (401)	❌ No	—

Iteration Limits

Adaptive Strategy

def determine_max_iterations(task_description: str) -> int:
    score = 0

    # New feature: +3
    if any(word in task for word in ["new", "add", "implement"]):
        score += 3

    # Bug fix: +1
    elif any(word in task for word in ["fix", "bug"]):
        score += 1

    # Integration: +2
    if "integration" in task or "api" in task:
        score += 2

    # Database: +2
    if "database" in task or "schema" in task:
        score += 2

    # Scoring → Limit
    if score <= 3:
        return 3  # Simple (e.g., "Fix typo in README")
    elif score <= 6:
        return 5  # Moderate (e.g., "Add validation rule")
    else:
        return 7  # Complex (e.g., "Implement MTIR-SQL")

Rationale:

Simple tasks shouldn't burn 5 iterations on typo fixes
Complex tasks need room for multi-stage debugging
Prevents infinite loops (hard cap at 7)

Performance Optimizations

1. Inline Validation

Saved: 1 LLM call per iteration
Impact: 20-40s per iteration

2. Auto-Fix Trivial Errors

Saved: 1 iteration for 20% of errors
Impact: 2-3 minutes per auto-fixed error

3. Sequential (Not Parallel) Quality Gate

Why not parallel: Complexity > benefit
Current: 60-70s for all checks
Parallel theoretical: 30-40s (but race conditions risk)

Decision: Keep sequential for robustness.

Security Architecture

Input Validation

# Task ID validation
assert re.match(r'^Task-\d+$', task_id), "Invalid task ID format"

# File path validation (prevent directory traversal)
assert not '..' in file_path, "Directory traversal blocked"
assert file_path.startswith('analytics-platform/'), "Invalid base path"

Output Sanitization

All LLM outputs validated by Quality Gate before commit
No eval(), exec(), or dynamic code execution
File writes use explicit paths (no user input in paths)

Secrets Protection

API keys in environment variables only
Quality Gate includes secrets check (regex patterns for keys)
Git hooks prevent accidental commits (gitleaks)

Monitoring & Observability

Metrics Collection

# Per-task metrics
{
    "task_id": "Task-104",
    "success": true,
    "duration_sec": 156.3,
    "iterations": 2,
    "errors": [
        {"iteration": 1, "type": "tactical", "error": "Type mismatch..."}
    ],
    "auto_fixes": 1,
    "hallucinations": 0
}

# Aggregate metrics
{
    "success_rate": 70.0,
    "avg_iterations": 2.4,
    "avg_duration_sec": 180.5,
    "hallucination_rate": 8.0,
    "auto_fix_success_rate": 95.0
}

Key Performance Indicators (KPIs)

KPI	Target	Red Flag
Success rate	≥70%	<50%
Avg iterations	≤3	>5
Hallucination rate	≤10%	>15%
Auto-fix success	≥90%	<80%

Scalability Considerations

Current Limitations

Sequential execution: 1 task at a time (~2h per task)
Single repo: Jorvis only
Manual pilot: Human reviews pilot_report.json

Future Scaling Strategies

Phase 2 (10+ tasks/week):

Git worktrees for parallel execution (1 worktree per task)
Task queue with priority (P0 > P1 > P2)
Automated rollback on 3+ consecutive failures

Phase 3 (100+ tasks/week):

Multi-repo support (shared workflow, repo-specific config)
Distributed execution (multiple machines)
ML-based prompt optimization (learn from failures)

Edge Cases & Limitations

Known Limitations

Multi-file refactoring: Limited to ~5 files per task
- Workaround: Break into smaller tasks
Breaking API changes: Requires manual coordination
- Workaround: Human-designed ADR, automated implementation only
Non-deterministic tests: Flaky tests cause false failures
- Workaround: Fix flaky tests first, then automate
External dependencies: npm package updates not automated
- Workaround: Manual dependency updates, automation for code changes only

Edge Case Handling

Scenario: Gemini generates valid code but uses deprecated API

Current behavior: Quality Gate passes (code compiles), deployed to prod

Solution: Add deprecation linter rule to Quality Gate

Scenario: Task requires human decision (e.g., "Choose color scheme")

Current behavior: Architect makes arbitrary choice, may not match user preference

Solution: Add AskUserQuestion node for ambiguous requirements (future enhancement)

Comparison with Alternatives

vs. GitHub Copilot

Feature	Jorvis Automation	GitHub Copilot
Scope	Full task (design → implementation → tests)	Single function/file
Validation	Quality Gate enforced	Manual review
Context	Full repo + HANDOFF	Current file
Cost	$0.30-0.70/task	$10-20/month flat

Use case: Copilot for interactive coding, Jorvis for full task automation.

vs. Devin AI

Feature	Jorvis Automation	Devin AI
Architecture	Open source (LangGraph)	Proprietary
Cost	Pay-per-use	Subscription ($500/month)
Customization	Full control	Limited
Jorvis-specific	Yes (CLAUDE.md, invariants)	Generic

Use case: Devin for general-purpose, Jorvis automation for domain-specific (analytics-platform).

Last Updated: 2026-01-19 Next Review: After pilot results (Feb 2026)