Architecture Deep Dive
Architecture Deep Dive
System: Jorvis Task Automation Version: 1.0.0 Date: 2026-01-19
Design Principles
1. Separation of Concerns
Architect (Claude Opus):
- Strategic thinking
- Design decisions (ADR creation)
- Code review
- Error classification
- 2-3 calls per task
Executor (Gemini):
- Tactical implementation
- Code generation
- Fast iteration
- 3-5 calls per task
Quality Gate (GitHub Actions - Local):
- Deterministic validation
- 100% enforcement
- No LLM involved
2. Fail-Fast Philosophy
Error detected → Classify immediately → Route to appropriate handler
Trivial error → Auto-fix → Re-validate (0 iterations wasted)
Tactical error → Executor retry with feedback
Strategic error → Architect redesign
3. Defense in Depth
5 Layers of Quality Enforcement:
- Documentation — CLAUDE.md, AGENTS.md, style guides
- Pre-commit hooks — Gitleaks, sanitization
- Antigravity roles —
.antigravity/rules(soft enforcement) - Quality Gate — HARD BLOCK (lint, build, test, coverage non-regression)
- Branch Protection — GitHub Pro settings (PR approval required)
Automation relies on Layers 4-5 (100% guaranteed enforcement).
State Machine
State Definition
interface TaskState {
task_id: string; // "Task-104"
task_description: string; // From TASK_BOARD.md
adr: string | null; // Architecture Decision Record
code_files: string[]; // Generated file paths
validation_errors: ValidationError[]; // From Quality Gate
conversation_history: Message[]; // Full context
review_feedback: string | null; // Architect feedback
status: Status; // Current state
iteration: number; // Current iteration
max_iterations: number; // Adaptive limit (3-7)
warnings: string[]; // ADR validation warnings
}
type Status = "design" | "implementation" | "review" | "approved" | "failed";
State Transitions
stateDiagram-v2
[*] --> design
design --> implementation: ADR created
implementation --> quality_gate: Code generated
quality_gate --> approved: All checks pass
quality_gate --> review: Validation errors
review --> implementation: Tactical errors (executor retry)
review --> design: Strategic errors (architect redesign)
review --> failed: Max iterations reached
approved --> [*]
failed --> [*]
Transition Logic
# From quality_gate
if no errors:
state["status"] = "approved"
return state # → END
else:
state["status"] = "review"
return state # → architect_review
# From architect_review
errors = classify_errors(state["validation_errors"])
if errors["trivial"] only:
auto_fix()
return run_quality_gate() # Re-validate immediately
elif errors["tactical"]:
state["status"] = "implementation"
return state # → executor_implement
elif errors["strategic"]:
state["status"] = "design"
return state # → architect_design
if iteration >= max_iterations:
state["status"] = "failed"
return state # → END (human escalation)
Error Classification Algorithm
Classification Matrix
| Error Pattern | Category | Handler | Iterations Cost |
|---|---|---|---|
cannot find name 'X' | Trivial | Auto-fix import | 0 |
'X' is declared but never used | Trivial | ESLint disable | 0 |
Type 'X' not assignable to 'Y' | Tactical | Executor retry | 1 |
Test 'X' failed | Tactical | Executor retry | 1 |
Coverage 78% < 80% | Strategic | Architect redesign | 2 |
Property 'foo' does not exist (hallucination) | Hallucination | Executor retry with warning | 1 |
Hallucination Detection
Gemini specific patterns:
hallucination_patterns = [
# Invented methods
r"property ['\"](\w+)['\"] does not exist on type",
# Non-existent modules
r"cannot find module ['\"]@?(\w+)",
# Wrong framework (Jorvis is NestJS, not React/Vue/Angular)
r"module ['\"]react['\"]",
r"module ['\"]vue['\"]",
r"module ['\"]angular['\"]",
# Invented services (pattern: ends with Service/Controller/Module)
r"cannot find name ['\"]\w+(Service|Controller|Module)['\"]",
]
Why separate classification:
- Hallucinations need explicit feedback: "This class doesn't exist in Jorvis codebase"
- Higher retry priority (Gemini can self-correct with clear feedback)
- Metrics tracking (hallucination rate = quality indicator)
Auto-Fix System
Dynamic Import Detection
Problem: Hardcoded import paths break on refactoring.
Solution: Grep-based symbol lookup.
def find_symbol_definition(name: str) -> Optional[str]:
"""Find file where symbol is exported"""
patterns = [
f"export class {name}",
f"export interface {name}",
f"export type {name}",
f"export const {name}",
]
for pattern in patterns:
result = subprocess.run(
["grep", "-rl", pattern, "analytics-platform/src"],
capture_output=True,
text=True
)
if result.returncode == 0:
return result.stdout.strip().split('\n')[0] # First match
return None
Relative path calculation:
# Example:
from_file = "analytics-platform/src/question/question.controller.ts"
to_file = "analytics-platform/src/ai/graph/graph.types.ts"
# Result: "../../ai/graph/graph.types"
Safety Guards
Whitelist approach:
# Only auto-fix known safe symbols
safe_imports = {
# NestJS core (always safe)
"Injectable", "Controller", "Get", "Post",
# Jorvis services (verified to exist)
"GraphOrchestratorService", "SchemaContextService",
# Jorvis types (verified to exist)
"GraphState", "ConversationQuestionDto",
}
# External packages, unknown symbols → SKIP auto-fix
Why whitelist over blacklist:
- False positive auto-fix is worse than skipping
- Gemini hallucinations can invent plausible-sounding names
- Better to require 1 manual iteration than inject wrong import
ESLint Disable (Safer than Renaming)
Old approach (risky):
# Rename: const foo = ... → const _foo = ...
# Problem: Breaks if "foo" is used elsewhere in file
New approach (safe):
# Add comment above declaration
// eslint-disable-next-line @typescript-eslint/no-unused-vars
const foo = ... // Still unused, but lint passes
Why safer:
- No code modification (only comments)
- Explicit intent (human reviewer knows variable unused)
- Reversible (remove comment if variable later used)
ADR Validation Strategy
Inline Validation (No Double LLM Call)
Old approach:
- Architect creates ADR
- Separate LLM call validates against invariants
- If violations → regenerate ADR
Problem: 2 LLM calls per iteration = slow + expensive
New approach:
- Architect creates ADR with inline invariant compliance section
- Lightweight regex check for missing sections
- If compliance section present → trust LLM
Prompt structure:
Create an ADR that:
...
5. EXPLICITLY states: "This ADR does NOT violate any invariants"
6. JUSTIFIES any new dependencies
Format:
...
### Invariant Compliance
- Anti-loop: ✅ Does not call Open WebUI
- SQL safety: ✅ Uses SqlGuard
...
Post-validation (quick check):
def quick_validate_adr(adr: str) -> dict:
warnings = []
if "Invariant Compliance" not in adr:
warnings.append("ADR missing compliance section")
# Still do keyword check for anti-loop (critical invariant)
if "openwebui.api" in adr.lower():
warnings.append("May mention Open WebUI API (review manually)")
return {"warnings": warnings}
Benefits:
- 1 LLM call instead of 2
- Architect forced to think about invariants upfront
- Compliance section visible in ADR for human review
Git Branch Isolation
Why Sequential Execution
Parallel execution problems:
# Task 1 (branch: task/task-104)
git checkout -b task/task-104
npm run build # Modifies node_modules/.cache
# Task 2 (branch: task/task-105, parallel)
git checkout -b task/task-105
npm run build # Conflicts with Task 1's cache!
# Result: Race condition, unpredictable failures
Sequential execution benefits:
# Task 1
git checkout main
git checkout -b task/task-104
# ... work on Task 1 ...
git checkout main # Clean state
# Task 2
git checkout main
git checkout -b task/task-105
# ... work on Task 2 (no conflicts)
Branch Lifecycle
def run_single_task_isolated(task_id: str) -> dict:
# 1. Start from clean main
subprocess.run(["git", "checkout", "main"])
subprocess.run(["git", "pull", "origin", "main"])
# 2. Create task branch
branch_name = f"task/{task_id.lower()}"
subprocess.run(["git", "checkout", "-b", branch_name])
# 3. Run workflow (modifies files)
result = run_workflow(task_id)
# 4. Return to main
subprocess.run(["git", "checkout", "main"])
# 5. Cleanup (if successful)
if result["success"]:
subprocess.run(["git", "branch", "-D", branch_name])
else:
# Keep branch for debugging
print(f"⚠️ Keeping branch {branch_name} for debugging")
return result
API Retry Strategy
Exponential Backoff
def call_llm_with_retry(client, messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.messages.create(messages=messages)
except Exception as e:
if "rate" in str(e).lower() and attempt < max_retries - 1:
wait_time = (2 ** attempt) * 2 # 2s, 4s, 8s
print(f"⚠️ Rate limit, retrying in {wait_time}s...")
time.sleep(wait_time)
else:
raise # Give up or non-retryable error
Retry scenarios:
| Error Type | Retry? | Backoff |
|---|---|---|
| Rate limit (429) | ✅ Yes | Exponential |
| Timeout (503) | ✅ Yes | Exponential |
| Invalid request (400) | ❌ No | — |
| Auth error (401) | ❌ No | — |
Iteration Limits
Adaptive Strategy
def determine_max_iterations(task_description: str) -> int:
score = 0
# New feature: +3
if any(word in task for word in ["new", "add", "implement"]):
score += 3
# Bug fix: +1
elif any(word in task for word in ["fix", "bug"]):
score += 1
# Integration: +2
if "integration" in task or "api" in task:
score += 2
# Database: +2
if "database" in task or "schema" in task:
score += 2
# Scoring → Limit
if score <= 3:
return 3 # Simple (e.g., "Fix typo in README")
elif score <= 6:
return 5 # Moderate (e.g., "Add validation rule")
else:
return 7 # Complex (e.g., "Implement MTIR-SQL")
Rationale:
- Simple tasks shouldn't burn 5 iterations on typo fixes
- Complex tasks need room for multi-stage debugging
- Prevents infinite loops (hard cap at 7)
Performance Optimizations
1. Inline Validation
- Saved: 1 LLM call per iteration
- Impact: 20-40s per iteration
2. Auto-Fix Trivial Errors
- Saved: 1 iteration for 20% of errors
- Impact: 2-3 minutes per auto-fixed error
3. Sequential (Not Parallel) Quality Gate
- Why not parallel: Complexity > benefit
- Current: 60-70s for all checks
- Parallel theoretical: 30-40s (but race conditions risk)
Decision: Keep sequential for robustness.
Security Architecture
Input Validation
# Task ID validation
assert re.match(r'^Task-\d+$', task_id), "Invalid task ID format"
# File path validation (prevent directory traversal)
assert not '..' in file_path, "Directory traversal blocked"
assert file_path.startswith('analytics-platform/'), "Invalid base path"
Output Sanitization
- All LLM outputs validated by Quality Gate before commit
- No
eval(),exec(), or dynamic code execution - File writes use explicit paths (no user input in paths)
Secrets Protection
- API keys in environment variables only
- Quality Gate includes secrets check (regex patterns for keys)
- Git hooks prevent accidental commits (gitleaks)
Monitoring & Observability
Metrics Collection
# Per-task metrics
{
"task_id": "Task-104",
"success": true,
"duration_sec": 156.3,
"iterations": 2,
"errors": [
{"iteration": 1, "type": "tactical", "error": "Type mismatch..."}
],
"auto_fixes": 1,
"hallucinations": 0
}
# Aggregate metrics
{
"success_rate": 70.0,
"avg_iterations": 2.4,
"avg_duration_sec": 180.5,
"hallucination_rate": 8.0,
"auto_fix_success_rate": 95.0
}
Key Performance Indicators (KPIs)
| KPI | Target | Red Flag |
|---|---|---|
| Success rate | ≥70% | <50% |
| Avg iterations | ≤3 | >5 |
| Hallucination rate | ≤10% | >15% |
| Auto-fix success | ≥90% | <80% |
Scalability Considerations
Current Limitations
- Sequential execution: 1 task at a time (~2h per task)
- Single repo: Jorvis only
- Manual pilot: Human reviews pilot_report.json
Future Scaling Strategies
Phase 2 (10+ tasks/week):
- Git worktrees for parallel execution (1 worktree per task)
- Task queue with priority (P0 > P1 > P2)
- Automated rollback on 3+ consecutive failures
Phase 3 (100+ tasks/week):
- Multi-repo support (shared workflow, repo-specific config)
- Distributed execution (multiple machines)
- ML-based prompt optimization (learn from failures)
Edge Cases & Limitations
Known Limitations
-
Multi-file refactoring: Limited to ~5 files per task
- Workaround: Break into smaller tasks
-
Breaking API changes: Requires manual coordination
- Workaround: Human-designed ADR, automated implementation only
-
Non-deterministic tests: Flaky tests cause false failures
- Workaround: Fix flaky tests first, then automate
-
External dependencies: npm package updates not automated
- Workaround: Manual dependency updates, automation for code changes only
Edge Case Handling
Scenario: Gemini generates valid code but uses deprecated API
Current behavior: Quality Gate passes (code compiles), deployed to prod
Solution: Add deprecation linter rule to Quality Gate
Scenario: Task requires human decision (e.g., "Choose color scheme")
Current behavior: Architect makes arbitrary choice, may not match user preference
Solution: Add AskUserQuestion node for ambiguous requirements (future enhancement)
Comparison with Alternatives
vs. GitHub Copilot
| Feature | Jorvis Automation | GitHub Copilot |
|---|---|---|
| Scope | Full task (design → implementation → tests) | Single function/file |
| Validation | Quality Gate enforced | Manual review |
| Context | Full repo + HANDOFF | Current file |
| Cost | $0.30-0.70/task | $10-20/month flat |
Use case: Copilot for interactive coding, Jorvis for full task automation.
vs. Devin AI
| Feature | Jorvis Automation | Devin AI |
|---|---|---|
| Architecture | Open source (LangGraph) | Proprietary |
| Cost | Pay-per-use | Subscription ($500/month) |
| Customization | Full control | Limited |
| Jorvis-specific | Yes (CLAUDE.md, invariants) | Generic |
Use case: Devin for general-purpose, Jorvis automation for domain-specific (analytics-platform).
Last Updated: 2026-01-19 Next Review: After pilot results (Feb 2026)