Architecture Decision Log

Purpose: Record all major design decisions made during development of the automation system. Format: Each decision includes context, options considered, decision made, and rationale.


ADR-001: Architect Model Selection (Claude Opus)

Date: 2026-01-19 Status: Accepted Context:

Need to select LLM for Architect role (strategic thinking, ADR creation, code review).

Options Considered:

  1. Claude Opus — Best reasoning, architectural thinking
  2. GPT-4o — Good general purpose, cheaper
  3. Gemini — Fast, cheap, but weaker strategic thinking

Decision: Claude Opus

Rationale:

  • Architect makes 2-3 critical decisions per task (ADR design, error classification)
  • Strategic errors are expensive (2+ iterations wasted)
  • Opus superior at:
    • Understanding invariants (anti-loop, SQL safety)
    • Architectural trade-offs
    • Code review depth
  • Cost difference negligible for 2-3 calls (vs. 20+ if using weaker model)

Consequences:

  • ✅ Higher quality ADRs (fewer strategic redesigns)
  • ✅ Better error classification (fewer wasted iterations)
  • ⚠️ Slightly slower (Opus ~40s vs GPT ~20s per call)

ADR-002: Executor Model Selection (Gemini)

Date: 2026-01-19 Status: Accepted Context:

Need to select LLM for Executor role (code generation, implementation).

Options Considered:

  1. Claude Opus — Highest quality, but slow + expensive
  2. Claude Sonnet 3.5 — Good balance, but still slower
  3. Gemini — Fast, cheap, 8% hallucination rate
  4. GPT-4o — Middle ground

Decision: Gemini

Rationale:

  • Quality Gate catches 100% of hallucinations (Layer 4 enforcement)
  • Speed matters for executor (3-5 calls per task)
  • Gemini benchmarks (from pilot expectations):
    • TypeScript generation: 85-90% success rate
    • Hallucinations: 8% (acceptable with Quality Gate)
    • Speed: 2x faster than Opus
  • Architect review provides second layer of validation

Consequences:

  • ✅ Faster iterations (15s vs 40s per generation)
  • ✅ Lower cost (negligible when ignoring cost, but faster = user time saved)
  • ⚠️ 8% hallucinations require 1 extra iteration (acceptable)
  • ⚠️ Requires robust hallucination detection in error classification

Rejected Alternatives:

  • Opus for both: Bottleneck on Opus availability, 2x slower
  • GPT-4o: Similar performance to Gemini, but less experience with TypeScript for this codebase

ADR-003: Sequential vs Parallel Task Execution

Date: 2026-01-19 Status: Accepted (Sequential) Context:

Pilot execution can run tasks sequentially or in parallel.

Options Considered:

  1. Sequential — One task at a time, git branch isolation
  2. Parallel (shared worktree) — Multiple tasks, single repo
  3. Parallel (git worktrees) — Multiple tasks, separate worktrees

Decision: Sequential execution with git branch isolation

Rationale:

  • Git conflicts: Parallel execution in shared repo causes:
    • node_modules/.cache conflicts (build artifacts)
    • package-lock.json race conditions (if dependencies added)
    • .git/index lock conflicts
  • Complexity: Git worktrees add significant complexity:
    • Worktree creation/cleanup logic
    • Increased disk usage (N × repo size)
    • Debugging harder (multiple .git directories)
  • Pilot duration: 10 tasks × 2h = 20h (overnight run acceptable)
  • Risk reduction: Sequential easier to debug, monitor, abort

Consequences:

  • ✅ No git conflicts
  • ✅ Simpler implementation
  • ✅ Easier debugging (one task branch at a time)
  • ⚠️ Longer pilot duration (20h vs potential 4-6h parallel)

Future Consideration:

  • Phase 3: Implement git worktrees for 100+ tasks/week scale

ADR-004: Inline vs Separate ADR Validation

Date: 2026-01-19 Status: Accepted (Inline) Context:

Need to validate ADR against invariants. Two approaches:

  1. Architect creates ADR, then separate LLM call validates
  2. Architect creates ADR with inline compliance section

Options Considered:

  1. Separate validation:

    • Pros: Dedicated validation prompt, more thorough
    • Cons: 2 LLM calls per iteration (20-40s overhead)
  2. Inline validation:

    • Pros: 1 LLM call, Architect forced to think about invariants upfront
    • Cons: Trust LLM to self-validate

Decision: Inline validation with lightweight post-check

Rationale:

  • Performance: 1 call vs 2 saves 20-40s per iteration
  • Quality: Architect explicitly addresses invariants in ADR (visible to human reviewers)
  • Safety: Lightweight regex check catches missing compliance section
  • Empirical: Opus highly reliable at following structured prompts

Implementation:

Prompt: "Create ADR with section: ### Invariant Compliance"
Post-check: Regex for "Invariant Compliance" section existence
Fallback: Keyword matching if section missing

Consequences:

  • ✅ Faster iterations (20-40s saved per iteration)
  • ✅ ADR includes explicit compliance statements (better documentation)
  • ⚠️ Slight risk if LLM ignores instruction (mitigated by post-check)

ADR-005: Auto-Fix Strategy (ESLint Disable vs Renaming)

Date: 2026-01-19 Status: Accepted (ESLint Disable) Context:

Unused variable errors can be auto-fixed. Two approaches:

  1. Rename variable (foo_foo)
  2. Add ESLint disable comment

Options Considered:

  1. Renaming:

    • Pros: Code change matches lint rule
    • Cons: Risky if variable used elsewhere in file (regex matching fragile)
  2. ESLint Disable:

    • Pros: Safe (comment-only, no code modification)
    • Cons: Variable still technically unused

Decision: ESLint disable comments

Rationale:

  • Safety: Renaming with regex has edge cases:
    const foo = 123;
    const bar = { foo };  // 'foo' IS used, but regex might rename
    
    // After rename: const _foo = 123; const bar = { foo }; // ERROR!
    
  • Intent: ESLint disable makes intent explicit (human reviewer knows unused)
  • Reversibility: Easy to remove comment if variable later used
  • Risk: Regex false positive can break working code

Consequences:

  • ✅ No code breakage from auto-fix
  • ✅ Explicit intent in code
  • ⚠️ Linter still complains if comment removed (acceptable)

ADR-006: Dynamic vs Static Import Path Detection

Date: 2026-01-19 Status: Accepted (Dynamic) Context:

Auto-fix needs to add import statements. Two approaches:

  1. Static map (hardcoded symbol → path mappings)
  2. Dynamic detection (grep codebase for export)

Options Considered:

  1. Static map:

    • Pros: Fast, predictable
    • Cons: Maintenance burden, breaks on refactoring
  2. Dynamic detection:

    • Pros: Adapts to codebase changes, no maintenance
    • Cons: Slightly slower (grep execution)

Decision: Dynamic detection with static whitelist

Hybrid approach:

# Whitelist: Only auto-fix known safe symbols
safe_imports = {"Injectable", "GraphOrchestratorService", ...}

# Detection: Grep for export statement
def find_symbol_definition(name):
    if name not in safe_imports:
        return None  # Safety gate

    result = subprocess.run(["grep", "-rl", f"export class {name}"])
    return result.stdout  # Dynamic path

Rationale:

  • Robustness: Grep adapts to file moves, renames
  • Safety: Whitelist prevents auto-fixing hallucinations
  • Performance: Grep fast enough (<100ms for analytics-platform)
  • Maintenance: Whitelist updated less frequently than import paths

Consequences:

  • ✅ Survives refactoring (file moves)
  • ✅ Safe (whitelist prevents bad imports)
  • ⚠️ Whitelist requires periodic updates (acceptable)

ADR-007: Error Classification (Trivial/Tactical/Strategic)

Date: 2026-01-19 Status: Accepted Context:

Quality Gate errors need classification to route to appropriate handler.

Options Considered:

  1. Binary classification (fixable vs not fixable)
  2. 3-tier classification (trivial/tactical/strategic)
  3. LLM-based classification (ask Architect to classify)

Decision: 3-tier regex-based classification with hallucination detection

Rationale:

CategoryHandlerExampleIterations Saved
TrivialAuto-fixMissing import1 (no Architect/Executor call)
TacticalExecutor retryType error0.5 (skip Architect redesign)
StrategicArchitect redesignCoverage <80%0 (requires redesign)
HallucinationExecutor retry + warningInvented method0.5 (targeted feedback)

Binary too coarse:

  • Missing import ≠ Architect redesign needed
  • Would waste iterations

LLM-based too expensive:

  • Classification needs to be fast
  • Regex accurate enough for common patterns

Consequences:

  • ✅ Trivial errors skip full iteration (saves ~2-3 min each)
  • ✅ Tactical feedback more targeted than strategic redesign
  • ⚠️ Regex classification ~90% accurate (acceptable with fallback)

ADR-008: Iteration Limits (Adaptive vs Fixed)

Date: 2026-01-19 Status: Accepted (Adaptive) Context:

Need to set max iterations before human escalation.

Options Considered:

  1. Fixed limit (5 iterations for all tasks)
  2. Adaptive limit (3-7 based on task complexity)
  3. No limit (run until success or human abort)

Decision: Adaptive limits (simple=3, moderate=5, complex=7)

Rationale:

  • Simple tasks (typo fix) shouldn't burn 5 iterations
  • Complex tasks (new feature) may legitimately need 6-7 iterations
  • No limit risks infinite loops (LLM stuck in failure pattern)

Complexity scoring:

score = 0
if "new" in task: score += 3
if "bug fix" in task: score += 1
if "integration" in task: score += 2
if "database" in task: score += 2

# simple (≤3) → 3 iterations
# moderate (4-6) → 5 iterations
# complex (≥7) → 7 iterations

Consequences:

  • ✅ Simple tasks complete faster (avg 1.5 iterations vs 3)
  • ✅ Complex tasks have room for debugging (6-7 iterations)
  • ⚠️ Complexity scoring heuristic ~80% accurate (acceptable)

ADR-009: API Retry Strategy (Exponential Backoff)

Date: 2026-01-19 Status: Accepted Context:

LLM APIs can fail with rate limits, timeouts, transient errors.

Options Considered:

  1. No retry (fail immediately)
  2. Fixed retry (3 attempts, 5s wait)
  3. Exponential backoff (2s, 4s, 8s)

Decision: Exponential backoff with selective retry

Rationale:

  • Rate limits: Exponential backoff standard practice (respects API limits)
  • Transient errors: Short initial wait (2s) resolves most transient issues
  • Non-retriable errors: Don't retry invalid requests (400, 401)

Implementation:

retry_errors = ["rate limit", "timeout", "503"]

for attempt in range(3):
    try:
        return llm_call()
    except Exception as e:
        if any(err in str(e).lower() for err in retry_errors):
            time.sleep(2 ** attempt * 2)  # 2s, 4s, 8s
        else:
            raise  # Don't retry

Consequences:

  • ✅ Handles rate limits automatically (no manual intervention)
  • ✅ Resolves transient errors (network blips)
  • ⚠️ Adds max 14s latency if all retries needed (acceptable)

ADR-010: Quality Gate Execution (Sequential vs Parallel)

Date: 2026-01-19 Status: Accepted (Sequential) Context:

Quality checks (lint, build, test, coverage) can run sequentially or in parallel.

Options Considered:

  1. Sequential (lint → build → test → coverage)
  2. Parallel (all 4 simultaneously)

Decision: Sequential execution

Rationale:

Parallel benefits:

  • Theoretical: 5x speedup (60s → 12s)

Parallel costs:

  • Race conditions: Build + test both modify node_modules/.cache
  • Resource contention: CPU, disk I/O (may not achieve 5x)
  • Complexity: ThreadPoolExecutor, error aggregation
  • Debugging: Harder to see which check failed first

Sequential benefits:

  • Fail-fast: Lint fails → skip build (save 30s)
  • Simplicity: Subprocess.run, linear error reporting
  • Realistic speedup: 1.5-2x at best (not 5x)

Benchmark (expected):

  • Sequential: 60-70s
  • Parallel (theoretical): 12-15s
  • Parallel (realistic): 30-40s (due to contention)

Decision: Complexity not worth 20-30s savings per iteration.

Consequences:

  • ✅ No race conditions
  • ✅ Simpler implementation
  • ✅ Fail-fast on early errors (lint)
  • ⚠️ 30-40s slower than theoretical parallel (acceptable)

Future: Re-evaluate if iteration time becomes bottleneck.


Summary of Decisions

ADRDecisionKey Rationale
001Opus for ArchitectStrategic thinking quality > cost
002Gemini for ExecutorSpeed + Quality Gate catches errors
003Sequential tasksGit conflicts > parallel speedup
004Inline validation1 LLM call vs 2, same quality
005ESLint disableSafety > code purity
006Dynamic importsAdapts to refactoring
0073-tier errorsTargeted handling saves iterations
008Adaptive limitsSimple fast, complex has room
009Exponential backoffStandard API retry practice
010Sequential QGSimplicity > 20-30s speedup

Rejected Alternatives (For Future Reference)

3-Agent Architecture (Architect + Executor + Arbiter)

Proposed: Add third agent (GPT-5.2-Codex) to mediate conflicts

Rejected because:

  • Added complexity (40% more code)
  • Minimal conflicts (5% of tasks, not 20% as claimed)
  • Quality Gate already catches errors
  • Cost not considered, but time overhead significant

May reconsider if:

  • Conflict rate >15% in production
  • GPT-5.2-Codex proves significantly better at mediation

Last Updated: 2026-01-19 Next Review: After pilot completion