Jorvis Task Automation System

Version: 1.0.0 Status: Experimental / Frozen (Phase Q — not current operating model) Created: 2026-01-19 Author: Claude Sonnet 4.5

Overview

The Jorvis Task Automation System is a 2-agent LangGraph workflow that automates software development tasks for Phase Q and beyond. It combines Claude Opus (Architect) with Gemini (Executor) to design, implement, validate, and deliver production-ready code changes.

Key Features

Dual-Agent Architecture: Strategic thinking (Opus) + Fast execution (Gemini)
Quality Gate Integration: 100% enforcement via GitHub Actions (Layer 4) + Branch Protection (Layer 5)
Git Branch Isolation: Sequential execution prevents race conditions
Smart Error Classification: Trivial auto-fix, tactical retry, strategic redesign
Hallucination Detection: Dedicated classification for Gemini-specific errors
Adaptive Iteration Limits: 3-7 iterations based on task complexity
API Retry Logic: Exponential backoff for rate limits and transient errors

Architecture

┌─────────────────────────────────────────────────────────────┐
│                   LangGraph Workflow                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌───────────────┐                                         │
│  │ Architect     │ (Claude Opus)                       │
│  │ Design        │ - Create ADR                            │
│  └───────┬───────┘ - Validate invariants                   │
│          │         - Review code                            │
│          ▼                                                  │
│  ┌───────────────┐                                         │
│  │ Executor      │ (Gemini)                          │
│  │ Implement     │ - Generate TypeScript/Python            │
│  └───────┬───────┘ - Create tests                          │
│          │         - Fast iteration                         │
│          ▼                                                  │
│  ┌───────────────┐                                         │
│  │ Quality Gate  │ (GitHub Actions - Local)                │
│  │ Validation    │ - Lint (ESLint)                         │
│  └───────┬───────┘ - Build (TypeScript)                    │
│          │         - Test (Jest)                            │
│          │         - Coverage (non-regression)                │
│          ▼                                                  │
│  ┌───────────────┐                                         │
│  │ Architect     │ (Claude Opus)                       │
│  │ Review        │ - Classify errors                       │
│  └───────────────┘ - Generate feedback                     │
│                    - Decide: retry/redesign/escalate       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Components

1. Core Workflow (`scripts/langgraph_workflow.py`)

Main execution script — orchestrates all nodes and manages state transitions.

Key Functions:

architect_design() — Creates ADR with inline invariant validation
executor_implement() — Generates code from ADR using Gemini
run_quality_gate() — Runs lint, build, test, coverage checks
architect_review() — Classifies errors and provides targeted feedback

State Machine:

design → implement → quality_gate → review
   ↑                                   ↓
   └──────────── (if errors) ──────────┘

2. Multi-Task Runner (`scripts/multi_task_runner.py`)

Pilot execution script — runs multiple tasks sequentially with git isolation.

Features:

Sequential execution (no parallel git conflicts)
Branch isolation per task
Metrics collection (success rate, iterations, duration)
JSON report generation

3. Helper Functions

Auto-Fix (`auto_fix_trivial_errors()`)

Missing imports: Dynamic path detection via grep
Unused variables: ESLint disable comments (safer than renaming)
Safety guards: Whitelist of known safe symbols

Error Classification (`classify_errors()`)

Trivial: Auto-fixable (imports, unused vars)
Tactical: Code-level issues (type errors, test failures)
Strategic: Design-level issues (coverage, invariants)
Hallucination: Gemini-specific invented methods/modules

ADR Validation (`validate_adr()`)

Inline validation: LLM explicitly addresses invariants in ADR
Quick check: Regex-based post-validation for missing sections
Fallback: Keyword matching if compliance section absent

Installation

Prerequisites

# Required
- Python 3.10+
- Node.js 20+
- Git
- npm (analytics-platform dependencies installed)

# API Keys (environment variables)
- ANTHROPIC_API_KEY (Claude Opus)
- GOOGLE_API_KEY (Gemini)

Setup

# 1. Install Python dependencies
pip install anthropic google-generativeai langgraph

# 2. Install Node.js dependencies (if not already done)
npm --prefix analytics-platform ci

# 3. Verify scripts are executable
chmod +x scripts/langgraph_workflow.py
chmod +x scripts/multi_task_runner.py

# 4. Set API keys
export ANTHROPIC_API_KEY="your-key-here"
export GOOGLE_API_KEY="your-key-here"

# 5. Verify setup
python3 scripts/langgraph_workflow.py --help

Usage

Single Task Execution

# Execute one task
python3 scripts/langgraph_workflow.py Task-104

# Example output:
# ============================================================
# ARCHITECT DESIGN (Iteration 1)
# ============================================================
# ✅ ADR Created (2341 chars)
#
# ============================================================
# EXECUTOR IMPLEMENT (Iteration 1)
# ============================================================
# ✅ Generated 3 files
#    - analytics-platform/src/ai/transparency/...
#
# ============================================================
# QUALITY GATE
# ============================================================
# 🔍 Running lint...
#    ✅ Lint passed
# 🏗️  Running build...
#    ✅ Build passed
# 🧪 Running tests...
#    ✅ Tests passed
# 📊 Checking coverage...
#    ✅ Coverage 82.5% ≥ 80%
#
# ============================================================
# FINAL RESULT: APPROVED
# ============================================================
# Iterations: 2
# Files generated: 3
#
# ✅ Task Task-104 completed successfully!

Pilot Execution (10 Tasks)

# Run pilot on 10 tasks
python3 scripts/multi_task_runner.py

# Output:
# 🚀 Starting SEQUENTIAL pilot: 10 tasks
#    Git branch isolation: ON
#    Base branch: main
#
# ============================================================
# TASK 1/10: Task-104
# ============================================================
# [Task-104] Creating branch task/task-104...
# [Task-104] Running workflow...
# [Task-104] ✅ SUCCESS
#    Duration: 156.3s
#    Iterations: 2
#
# ... (9 more tasks)
#
# ============================================================
# PILOT COMPLETE
# ============================================================
# Success: 7/10 (70.0%)
# Avg Iterations: 2.4
# Report: pilot_report.json

Configuration

`CONFIG.yaml` (Optional)

# Rollback thresholds
rollback:
  success_rate_go: 70              # ≥70% → GO
  success_rate_go_adjusted: 60     # ≥60% → GO with adjustments
  success_rate_retry: 50           # ≥50% → Retry pilot
  human_intervention_max: 35       # ≤35% intervention for GO

# Iteration limits
iterations:
  simple_max: 3
  moderate_max: 5
  complex_max: 7

# API retry
api_retry:
  max_attempts: 3
  backoff_base: 2  # seconds (exponential: 2s, 4s, 8s)

# Multi-task execution
execution:
  parallel_enabled: false  # MUST be false (git conflicts)
  base_branch: main
  cleanup_success_branches: true

Metrics & Monitoring

Pilot Report (`pilot_report.json`)

{
  "total_tasks": 10,
  "successful": 7,
  "failed": 3,
  "success_rate": 70.0,
  "avg_iterations": 2.4,
  "human_intervention_rate": 20.0,
  "results": [
    {
      "task_id": "Task-104",
      "success": true,
      "duration_sec": 156.3,
      "iterations": 2,
      "branch": "task/task-104"
    },
    ...
  ]
}

Key Metrics

Metric	Target	Interpretation
Success Rate	≥70%	Tasks completed without human intervention
Human Intervention	≤35%	Tasks requiring manual fixes
Avg Iterations	≤3	Efficiency of architect-executor loop
Duration	<2h/task	Time to complete average task

Rollback Decision Tree

IF success_rate >= 70% AND human_intervention <= 35%:
    → GO to Phase Q (current settings)

ELIF success_rate >= 60% AND human_intervention <= 40%:
    → GO to Phase Q with adjustments:
       - Increase max_iterations by 1
       - Add human checkpoint after iteration 3
       - Monitor first 3 tasks closely

ELIF success_rate >= 50%:
    → RETRY pilot with improvements:
       - Analyze failure patterns
       - Adjust prompts
       - Re-run on 5 failed + 5 new tasks
       - Re-evaluate

ELSE:
    → NO-GO for automation:
       - Manual execution for Phase Q
       - Continue development in parallel
       - Re-evaluate after Phase Q completion

Troubleshooting

Common Issues

1. Import Detection Fails

Symptom: Auto-fix can't find symbol definition

Solution:

# Check if grep finds the symbol
grep -r "export class YourSymbol" analytics-platform/src

# If not found, symbol may be external dependency
# Add to safe_imports whitelist manually

2. API Rate Limits

Symptom: "Rate limit exceeded" errors

Solution:

Retry logic handles this automatically (3 attempts with backoff)
If persistent, reduce parallel tasks or wait 1 minute

3. Git Branch Conflicts

Symptom: "Branch already exists" errors

Solution:

# Manual cleanup
git branch -D task/task-104

# Or delete all task branches
git branch | grep "task/" | xargs git branch -D

4. Quality Gate Fails on Clean Code

Symptom: Lint/build passes locally but fails in workflow

Solution:

# Ensure you're on clean main branch
git checkout main
git pull origin main

# Re-run quality gate manually
npm --prefix analytics-platform run lint
npm --prefix analytics-platform run build
npm --prefix analytics-platform test -- --runInBand

Maintenance

Updating Safe Imports Whitelist

File: scripts/langgraph_workflow.py → is_safe_import()

When to update:

New NestJS decorators added to project
New Jorvis services created
New shared types/interfaces

Process:

Add symbol to safe_imports set
Add corresponding import path to import_map
Test on synthetic task
Document change in git commit

Example:

safe_imports = {
    # ... existing ...
    "NewJorvisService",  # Added 2026-01-20
}

import_map = {
    # ... existing ...
    "NewJorvisService": "../services/new-jorvis.service",
}

Updating Hallucination Patterns

File: scripts/langgraph_workflow.py → is_hallucination()

When to update:

New Gemini hallucination patterns discovered during pilot
False positives detected

Process:

Add regex pattern to hallucination_patterns
Test on known hallucination examples
Document pattern and reasoning

Performance Benchmarks

Expected Performance (Pilot Results)

Task Type	Iterations	Duration	Success Rate
Simple (typo fix)	1-2	20-40min	90%+
Moderate (validation rule)	2-4	1-2h	70-80%
Complex (new feature)	4-6	2-4h	60-70%

Bottlenecks

Quality Gate (30-60s) — Sequential checks
Architect Review (20-40s) — Opus processing time
Executor Generation (15-30s) — Gemini code generation

Total per iteration: ~2-3 minutes overhead

Security Considerations

API Key Protection

Store keys in environment variables (never in code)
Use .env files for local development
GitHub Secrets for CI/CD

Code Injection Prevention

All LLM outputs validated by Quality Gate
No eval() or dynamic code execution
Git branch isolation prevents cross-contamination

PII/Secrets Detection

Quality Gate includes secrets check (already in .github/workflows/quality-gate.yml)
Auto-fix does not modify strings (no risk of exposing secrets)

Future Enhancements

Phase 2 (Post-Pilot)

Parallel execution with git worktrees
Web dashboard for real-time monitoring
Automated prompt optimization based on failure patterns
Integration with GitHub PR automation
Slack notifications for failures

Phase 3 (Production Scale)

Multi-repo support
Custom task types (beyond Phase Q)
A/B testing for prompt variations
Cost optimization (switch to cheaper models for simple tasks)

FAQ

Q: Why Gemini instead of Opus for Executor? A: Gemini is 40x cheaper and 2x faster. Quality Gate catches hallucinations (8% rate), so speed > accuracy for code generation.

Q: Can I run tasks in parallel? A: No. Git branch isolation requires sequential execution. Parallel execution causes merge conflicts.

Q: What if pilot success rate is 50%? A: Follow rollback decision tree → Retry pilot with adjusted prompts, not immediate Phase Q deployment.

Q: How do I add new invariants? A: Update CLAUDE.md → "Invariants to Preserve" section. Architect will automatically include in ADR validation.

Q: Can I use this for non-Phase Q tasks? A: Yes, but adjust determine_max_iterations() logic and test on similar task types first.

Support & Contributing

Issues: Report via GitHub Issues Questions: See docs/automation/TROUBLESHOOTING.md Updates: Check git log for scripts/langgraph_workflow.py

License

Internal Jorvis project. Not for external distribution.

Last Updated: 2026-01-19 Next Review: After pilot completion (Feb 2026)

Jorvis Task Automation System

Overview

Key Features

Architecture

Components

1. Core Workflow (scripts/langgraph_workflow.py)

2. Multi-Task Runner (scripts/multi_task_runner.py)

3. Helper Functions

Auto-Fix (auto_fix_trivial_errors())

Error Classification (classify_errors())

ADR Validation (validate_adr())

Installation

Prerequisites

Setup

Usage

Single Task Execution

Pilot Execution (10 Tasks)

Configuration

CONFIG.yaml (Optional)

Metrics & Monitoring

Pilot Report (pilot_report.json)

Key Metrics

Rollback Decision Tree

Troubleshooting

Common Issues

1. Import Detection Fails

2. API Rate Limits

3. Git Branch Conflicts

4. Quality Gate Fails on Clean Code

Maintenance

Updating Safe Imports Whitelist

Updating Hallucination Patterns

Performance Benchmarks

Expected Performance (Pilot Results)

Bottlenecks

Security Considerations

API Key Protection

Code Injection Prevention

PII/Secrets Detection

Future Enhancements

Phase 2 (Post-Pilot)

Phase 3 (Production Scale)

FAQ

Support & Contributing

License

1. Core Workflow (`scripts/langgraph_workflow.py`)

2. Multi-Task Runner (`scripts/multi_task_runner.py`)

Auto-Fix (`auto_fix_trivial_errors()`)

Error Classification (`classify_errors()`)

ADR Validation (`validate_adr()`)

`CONFIG.yaml` (Optional)

Pilot Report (`pilot_report.json`)