Jorvis Task Automation System

Version: 1.0.0 Status: Experimental / Frozen (Phase Q — not current operating model) Created: 2026-01-19 Author: Claude Sonnet 4.5


Overview

The Jorvis Task Automation System is a 2-agent LangGraph workflow that automates software development tasks for Phase Q and beyond. It combines Claude Opus (Architect) with Gemini (Executor) to design, implement, validate, and deliver production-ready code changes.

Key Features

  • Dual-Agent Architecture: Strategic thinking (Opus) + Fast execution (Gemini)
  • Quality Gate Integration: 100% enforcement via GitHub Actions (Layer 4) + Branch Protection (Layer 5)
  • Git Branch Isolation: Sequential execution prevents race conditions
  • Smart Error Classification: Trivial auto-fix, tactical retry, strategic redesign
  • Hallucination Detection: Dedicated classification for Gemini-specific errors
  • Adaptive Iteration Limits: 3-7 iterations based on task complexity
  • API Retry Logic: Exponential backoff for rate limits and transient errors

Architecture

┌─────────────────────────────────────────────────────────────┐
│                   LangGraph Workflow                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌───────────────┐                                         │
│  │ Architect     │ (Claude Opus)                       │
│  │ Design        │ - Create ADR                            │
│  └───────┬───────┘ - Validate invariants                   │
│          │         - Review code                            │
│          ▼                                                  │
│  ┌───────────────┐                                         │
│  │ Executor      │ (Gemini)                          │
│  │ Implement     │ - Generate TypeScript/Python            │
│  └───────┬───────┘ - Create tests                          │
│          │         - Fast iteration                         │
│          ▼                                                  │
│  ┌───────────────┐                                         │
│  │ Quality Gate  │ (GitHub Actions - Local)                │
│  │ Validation    │ - Lint (ESLint)                         │
│  └───────┬───────┘ - Build (TypeScript)                    │
│          │         - Test (Jest)                            │
│          │         - Coverage (non-regression)                │
│          ▼                                                  │
│  ┌───────────────┐                                         │
│  │ Architect     │ (Claude Opus)                       │
│  │ Review        │ - Classify errors                       │
│  └───────────────┘ - Generate feedback                     │
│                    - Decide: retry/redesign/escalate       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Components

1. Core Workflow (scripts/langgraph_workflow.py)

Main execution script — orchestrates all nodes and manages state transitions.

Key Functions:

  • architect_design() — Creates ADR with inline invariant validation
  • executor_implement() — Generates code from ADR using Gemini
  • run_quality_gate() — Runs lint, build, test, coverage checks
  • architect_review() — Classifies errors and provides targeted feedback

State Machine:

design → implement → quality_gate → review
   ↑                                   ↓
   └──────────── (if errors) ──────────┘

2. Multi-Task Runner (scripts/multi_task_runner.py)

Pilot execution script — runs multiple tasks sequentially with git isolation.

Features:

  • Sequential execution (no parallel git conflicts)
  • Branch isolation per task
  • Metrics collection (success rate, iterations, duration)
  • JSON report generation

3. Helper Functions

Auto-Fix (auto_fix_trivial_errors())

  • Missing imports: Dynamic path detection via grep
  • Unused variables: ESLint disable comments (safer than renaming)
  • Safety guards: Whitelist of known safe symbols

Error Classification (classify_errors())

  • Trivial: Auto-fixable (imports, unused vars)
  • Tactical: Code-level issues (type errors, test failures)
  • Strategic: Design-level issues (coverage, invariants)
  • Hallucination: Gemini-specific invented methods/modules

ADR Validation (validate_adr())

  • Inline validation: LLM explicitly addresses invariants in ADR
  • Quick check: Regex-based post-validation for missing sections
  • Fallback: Keyword matching if compliance section absent

Installation

Prerequisites

# Required
- Python 3.10+
- Node.js 20+
- Git
- npm (analytics-platform dependencies installed)

# API Keys (environment variables)
- ANTHROPIC_API_KEY (Claude Opus)
- GOOGLE_API_KEY (Gemini)

Setup

# 1. Install Python dependencies
pip install anthropic google-generativeai langgraph

# 2. Install Node.js dependencies (if not already done)
npm --prefix analytics-platform ci

# 3. Verify scripts are executable
chmod +x scripts/langgraph_workflow.py
chmod +x scripts/multi_task_runner.py

# 4. Set API keys
export ANTHROPIC_API_KEY="your-key-here"
export GOOGLE_API_KEY="your-key-here"

# 5. Verify setup
python3 scripts/langgraph_workflow.py --help

Usage

Single Task Execution

# Execute one task
python3 scripts/langgraph_workflow.py Task-104

# Example output:
# ============================================================
# ARCHITECT DESIGN (Iteration 1)
# ============================================================
# ✅ ADR Created (2341 chars)
#
# ============================================================
# EXECUTOR IMPLEMENT (Iteration 1)
# ============================================================
# ✅ Generated 3 files
#    - analytics-platform/src/ai/transparency/...
#
# ============================================================
# QUALITY GATE
# ============================================================
# 🔍 Running lint...
#    ✅ Lint passed
# 🏗️  Running build...
#    ✅ Build passed
# 🧪 Running tests...
#    ✅ Tests passed
# 📊 Checking coverage...
#    ✅ Coverage 82.5% ≥ 80%
#
# ============================================================
# FINAL RESULT: APPROVED
# ============================================================
# Iterations: 2
# Files generated: 3
#
# ✅ Task Task-104 completed successfully!

Pilot Execution (10 Tasks)

# Run pilot on 10 tasks
python3 scripts/multi_task_runner.py

# Output:
# 🚀 Starting SEQUENTIAL pilot: 10 tasks
#    Git branch isolation: ON
#    Base branch: main
#
# ============================================================
# TASK 1/10: Task-104
# ============================================================
# [Task-104] Creating branch task/task-104...
# [Task-104] Running workflow...
# [Task-104] ✅ SUCCESS
#    Duration: 156.3s
#    Iterations: 2
#
# ... (9 more tasks)
#
# ============================================================
# PILOT COMPLETE
# ============================================================
# Success: 7/10 (70.0%)
# Avg Iterations: 2.4
# Report: pilot_report.json

Configuration

CONFIG.yaml (Optional)

# Rollback thresholds
rollback:
  success_rate_go: 70              # ≥70% → GO
  success_rate_go_adjusted: 60     # ≥60% → GO with adjustments
  success_rate_retry: 50           # ≥50% → Retry pilot
  human_intervention_max: 35       # ≤35% intervention for GO

# Iteration limits
iterations:
  simple_max: 3
  moderate_max: 5
  complex_max: 7

# API retry
api_retry:
  max_attempts: 3
  backoff_base: 2  # seconds (exponential: 2s, 4s, 8s)

# Multi-task execution
execution:
  parallel_enabled: false  # MUST be false (git conflicts)
  base_branch: main
  cleanup_success_branches: true

Metrics & Monitoring

Pilot Report (pilot_report.json)

{
  "total_tasks": 10,
  "successful": 7,
  "failed": 3,
  "success_rate": 70.0,
  "avg_iterations": 2.4,
  "human_intervention_rate": 20.0,
  "results": [
    {
      "task_id": "Task-104",
      "success": true,
      "duration_sec": 156.3,
      "iterations": 2,
      "branch": "task/task-104"
    },
    ...
  ]
}

Key Metrics

MetricTargetInterpretation
Success Rate≥70%Tasks completed without human intervention
Human Intervention≤35%Tasks requiring manual fixes
Avg Iterations≤3Efficiency of architect-executor loop
Duration<2h/taskTime to complete average task

Rollback Decision Tree

IF success_rate >= 70% AND human_intervention <= 35%:
    → GO to Phase Q (current settings)

ELIF success_rate >= 60% AND human_intervention <= 40%:
    → GO to Phase Q with adjustments:
       - Increase max_iterations by 1
       - Add human checkpoint after iteration 3
       - Monitor first 3 tasks closely

ELIF success_rate >= 50%:
    → RETRY pilot with improvements:
       - Analyze failure patterns
       - Adjust prompts
       - Re-run on 5 failed + 5 new tasks
       - Re-evaluate

ELSE:
    → NO-GO for automation:
       - Manual execution for Phase Q
       - Continue development in parallel
       - Re-evaluate after Phase Q completion

Troubleshooting

Common Issues

1. Import Detection Fails

Symptom: Auto-fix can't find symbol definition

Solution:

# Check if grep finds the symbol
grep -r "export class YourSymbol" analytics-platform/src

# If not found, symbol may be external dependency
# Add to safe_imports whitelist manually

2. API Rate Limits

Symptom: "Rate limit exceeded" errors

Solution:

  • Retry logic handles this automatically (3 attempts with backoff)
  • If persistent, reduce parallel tasks or wait 1 minute

3. Git Branch Conflicts

Symptom: "Branch already exists" errors

Solution:

# Manual cleanup
git branch -D task/task-104

# Or delete all task branches
git branch | grep "task/" | xargs git branch -D

4. Quality Gate Fails on Clean Code

Symptom: Lint/build passes locally but fails in workflow

Solution:

# Ensure you're on clean main branch
git checkout main
git pull origin main

# Re-run quality gate manually
npm --prefix analytics-platform run lint
npm --prefix analytics-platform run build
npm --prefix analytics-platform test -- --runInBand

Maintenance

Updating Safe Imports Whitelist

File: scripts/langgraph_workflow.pyis_safe_import()

When to update:

  • New NestJS decorators added to project
  • New Jorvis services created
  • New shared types/interfaces

Process:

  1. Add symbol to safe_imports set
  2. Add corresponding import path to import_map
  3. Test on synthetic task
  4. Document change in git commit

Example:

safe_imports = {
    # ... existing ...
    "NewJorvisService",  # Added 2026-01-20
}

import_map = {
    # ... existing ...
    "NewJorvisService": "../services/new-jorvis.service",
}

Updating Hallucination Patterns

File: scripts/langgraph_workflow.pyis_hallucination()

When to update:

  • New Gemini hallucination patterns discovered during pilot
  • False positives detected

Process:

  1. Add regex pattern to hallucination_patterns
  2. Test on known hallucination examples
  3. Document pattern and reasoning

Performance Benchmarks

Expected Performance (Pilot Results)

Task TypeIterationsDurationSuccess Rate
Simple (typo fix)1-220-40min90%+
Moderate (validation rule)2-41-2h70-80%
Complex (new feature)4-62-4h60-70%

Bottlenecks

  1. Quality Gate (30-60s) — Sequential checks
  2. Architect Review (20-40s) — Opus processing time
  3. Executor Generation (15-30s) — Gemini code generation

Total per iteration: ~2-3 minutes overhead


Security Considerations

API Key Protection

  • Store keys in environment variables (never in code)
  • Use .env files for local development
  • GitHub Secrets for CI/CD

Code Injection Prevention

  • All LLM outputs validated by Quality Gate
  • No eval() or dynamic code execution
  • Git branch isolation prevents cross-contamination

PII/Secrets Detection

  • Quality Gate includes secrets check (already in .github/workflows/quality-gate.yml)
  • Auto-fix does not modify strings (no risk of exposing secrets)

Future Enhancements

Phase 2 (Post-Pilot)

  • Parallel execution with git worktrees
  • Web dashboard for real-time monitoring
  • Automated prompt optimization based on failure patterns
  • Integration with GitHub PR automation
  • Slack notifications for failures

Phase 3 (Production Scale)

  • Multi-repo support
  • Custom task types (beyond Phase Q)
  • A/B testing for prompt variations
  • Cost optimization (switch to cheaper models for simple tasks)

FAQ

Q: Why Gemini instead of Opus for Executor? A: Gemini is 40x cheaper and 2x faster. Quality Gate catches hallucinations (8% rate), so speed > accuracy for code generation.

Q: Can I run tasks in parallel? A: No. Git branch isolation requires sequential execution. Parallel execution causes merge conflicts.

Q: What if pilot success rate is 50%? A: Follow rollback decision tree → Retry pilot with adjusted prompts, not immediate Phase Q deployment.

Q: How do I add new invariants? A: Update CLAUDE.md → "Invariants to Preserve" section. Architect will automatically include in ADR validation.

Q: Can I use this for non-Phase Q tasks? A: Yes, but adjust determine_max_iterations() logic and test on similar task types first.


Support & Contributing

Issues: Report via GitHub Issues Questions: See docs/automation/TROUBLESHOOTING.md Updates: Check git log for scripts/langgraph_workflow.py


License

Internal Jorvis project. Not for external distribution.


Last Updated: 2026-01-19 Next Review: After pilot completion (Feb 2026)