Jorvis Task Automation System
Jorvis Task Automation System
Version: 1.0.0 Status: Experimental / Frozen (Phase Q — not current operating model) Created: 2026-01-19 Author: Claude Sonnet 4.5
Overview
The Jorvis Task Automation System is a 2-agent LangGraph workflow that automates software development tasks for Phase Q and beyond. It combines Claude Opus (Architect) with Gemini (Executor) to design, implement, validate, and deliver production-ready code changes.
Key Features
- Dual-Agent Architecture: Strategic thinking (Opus) + Fast execution (Gemini)
- Quality Gate Integration: 100% enforcement via GitHub Actions (Layer 4) + Branch Protection (Layer 5)
- Git Branch Isolation: Sequential execution prevents race conditions
- Smart Error Classification: Trivial auto-fix, tactical retry, strategic redesign
- Hallucination Detection: Dedicated classification for Gemini-specific errors
- Adaptive Iteration Limits: 3-7 iterations based on task complexity
- API Retry Logic: Exponential backoff for rate limits and transient errors
Architecture
┌─────────────────────────────────────────────────────────────┐
│ LangGraph Workflow │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ │
│ │ Architect │ (Claude Opus) │
│ │ Design │ - Create ADR │
│ └───────┬───────┘ - Validate invariants │
│ │ - Review code │
│ ▼ │
│ ┌───────────────┐ │
│ │ Executor │ (Gemini) │
│ │ Implement │ - Generate TypeScript/Python │
│ └───────┬───────┘ - Create tests │
│ │ - Fast iteration │
│ ▼ │
│ ┌───────────────┐ │
│ │ Quality Gate │ (GitHub Actions - Local) │
│ │ Validation │ - Lint (ESLint) │
│ └───────┬───────┘ - Build (TypeScript) │
│ │ - Test (Jest) │
│ │ - Coverage (non-regression) │
│ ▼ │
│ ┌───────────────┐ │
│ │ Architect │ (Claude Opus) │
│ │ Review │ - Classify errors │
│ └───────────────┘ - Generate feedback │
│ - Decide: retry/redesign/escalate │
│ │
└─────────────────────────────────────────────────────────────┘
Components
1. Core Workflow (scripts/langgraph_workflow.py)
Main execution script — orchestrates all nodes and manages state transitions.
Key Functions:
architect_design()— Creates ADR with inline invariant validationexecutor_implement()— Generates code from ADR using Geminirun_quality_gate()— Runs lint, build, test, coverage checksarchitect_review()— Classifies errors and provides targeted feedback
State Machine:
design → implement → quality_gate → review
↑ ↓
└──────────── (if errors) ──────────┘
2. Multi-Task Runner (scripts/multi_task_runner.py)
Pilot execution script — runs multiple tasks sequentially with git isolation.
Features:
- Sequential execution (no parallel git conflicts)
- Branch isolation per task
- Metrics collection (success rate, iterations, duration)
- JSON report generation
3. Helper Functions
Auto-Fix (auto_fix_trivial_errors())
- Missing imports: Dynamic path detection via grep
- Unused variables: ESLint disable comments (safer than renaming)
- Safety guards: Whitelist of known safe symbols
Error Classification (classify_errors())
- Trivial: Auto-fixable (imports, unused vars)
- Tactical: Code-level issues (type errors, test failures)
- Strategic: Design-level issues (coverage, invariants)
- Hallucination: Gemini-specific invented methods/modules
ADR Validation (validate_adr())
- Inline validation: LLM explicitly addresses invariants in ADR
- Quick check: Regex-based post-validation for missing sections
- Fallback: Keyword matching if compliance section absent
Installation
Prerequisites
# Required
- Python 3.10+
- Node.js 20+
- Git
- npm (analytics-platform dependencies installed)
# API Keys (environment variables)
- ANTHROPIC_API_KEY (Claude Opus)
- GOOGLE_API_KEY (Gemini)
Setup
# 1. Install Python dependencies
pip install anthropic google-generativeai langgraph
# 2. Install Node.js dependencies (if not already done)
npm --prefix analytics-platform ci
# 3. Verify scripts are executable
chmod +x scripts/langgraph_workflow.py
chmod +x scripts/multi_task_runner.py
# 4. Set API keys
export ANTHROPIC_API_KEY="your-key-here"
export GOOGLE_API_KEY="your-key-here"
# 5. Verify setup
python3 scripts/langgraph_workflow.py --help
Usage
Single Task Execution
# Execute one task
python3 scripts/langgraph_workflow.py Task-104
# Example output:
# ============================================================
# ARCHITECT DESIGN (Iteration 1)
# ============================================================
# ✅ ADR Created (2341 chars)
#
# ============================================================
# EXECUTOR IMPLEMENT (Iteration 1)
# ============================================================
# ✅ Generated 3 files
# - analytics-platform/src/ai/transparency/...
#
# ============================================================
# QUALITY GATE
# ============================================================
# 🔍 Running lint...
# ✅ Lint passed
# 🏗️ Running build...
# ✅ Build passed
# 🧪 Running tests...
# ✅ Tests passed
# 📊 Checking coverage...
# ✅ Coverage 82.5% ≥ 80%
#
# ============================================================
# FINAL RESULT: APPROVED
# ============================================================
# Iterations: 2
# Files generated: 3
#
# ✅ Task Task-104 completed successfully!
Pilot Execution (10 Tasks)
# Run pilot on 10 tasks
python3 scripts/multi_task_runner.py
# Output:
# 🚀 Starting SEQUENTIAL pilot: 10 tasks
# Git branch isolation: ON
# Base branch: main
#
# ============================================================
# TASK 1/10: Task-104
# ============================================================
# [Task-104] Creating branch task/task-104...
# [Task-104] Running workflow...
# [Task-104] ✅ SUCCESS
# Duration: 156.3s
# Iterations: 2
#
# ... (9 more tasks)
#
# ============================================================
# PILOT COMPLETE
# ============================================================
# Success: 7/10 (70.0%)
# Avg Iterations: 2.4
# Report: pilot_report.json
Configuration
CONFIG.yaml (Optional)
# Rollback thresholds
rollback:
success_rate_go: 70 # ≥70% → GO
success_rate_go_adjusted: 60 # ≥60% → GO with adjustments
success_rate_retry: 50 # ≥50% → Retry pilot
human_intervention_max: 35 # ≤35% intervention for GO
# Iteration limits
iterations:
simple_max: 3
moderate_max: 5
complex_max: 7
# API retry
api_retry:
max_attempts: 3
backoff_base: 2 # seconds (exponential: 2s, 4s, 8s)
# Multi-task execution
execution:
parallel_enabled: false # MUST be false (git conflicts)
base_branch: main
cleanup_success_branches: true
Metrics & Monitoring
Pilot Report (pilot_report.json)
{
"total_tasks": 10,
"successful": 7,
"failed": 3,
"success_rate": 70.0,
"avg_iterations": 2.4,
"human_intervention_rate": 20.0,
"results": [
{
"task_id": "Task-104",
"success": true,
"duration_sec": 156.3,
"iterations": 2,
"branch": "task/task-104"
},
...
]
}
Key Metrics
| Metric | Target | Interpretation |
|---|---|---|
| Success Rate | ≥70% | Tasks completed without human intervention |
| Human Intervention | ≤35% | Tasks requiring manual fixes |
| Avg Iterations | ≤3 | Efficiency of architect-executor loop |
| Duration | <2h/task | Time to complete average task |
Rollback Decision Tree
IF success_rate >= 70% AND human_intervention <= 35%:
→ GO to Phase Q (current settings)
ELIF success_rate >= 60% AND human_intervention <= 40%:
→ GO to Phase Q with adjustments:
- Increase max_iterations by 1
- Add human checkpoint after iteration 3
- Monitor first 3 tasks closely
ELIF success_rate >= 50%:
→ RETRY pilot with improvements:
- Analyze failure patterns
- Adjust prompts
- Re-run on 5 failed + 5 new tasks
- Re-evaluate
ELSE:
→ NO-GO for automation:
- Manual execution for Phase Q
- Continue development in parallel
- Re-evaluate after Phase Q completion
Troubleshooting
Common Issues
1. Import Detection Fails
Symptom: Auto-fix can't find symbol definition
Solution:
# Check if grep finds the symbol
grep -r "export class YourSymbol" analytics-platform/src
# If not found, symbol may be external dependency
# Add to safe_imports whitelist manually
2. API Rate Limits
Symptom: "Rate limit exceeded" errors
Solution:
- Retry logic handles this automatically (3 attempts with backoff)
- If persistent, reduce parallel tasks or wait 1 minute
3. Git Branch Conflicts
Symptom: "Branch already exists" errors
Solution:
# Manual cleanup
git branch -D task/task-104
# Or delete all task branches
git branch | grep "task/" | xargs git branch -D
4. Quality Gate Fails on Clean Code
Symptom: Lint/build passes locally but fails in workflow
Solution:
# Ensure you're on clean main branch
git checkout main
git pull origin main
# Re-run quality gate manually
npm --prefix analytics-platform run lint
npm --prefix analytics-platform run build
npm --prefix analytics-platform test -- --runInBand
Maintenance
Updating Safe Imports Whitelist
File: scripts/langgraph_workflow.py → is_safe_import()
When to update:
- New NestJS decorators added to project
- New Jorvis services created
- New shared types/interfaces
Process:
- Add symbol to
safe_importsset - Add corresponding import path to
import_map - Test on synthetic task
- Document change in git commit
Example:
safe_imports = {
# ... existing ...
"NewJorvisService", # Added 2026-01-20
}
import_map = {
# ... existing ...
"NewJorvisService": "../services/new-jorvis.service",
}
Updating Hallucination Patterns
File: scripts/langgraph_workflow.py → is_hallucination()
When to update:
- New Gemini hallucination patterns discovered during pilot
- False positives detected
Process:
- Add regex pattern to
hallucination_patterns - Test on known hallucination examples
- Document pattern and reasoning
Performance Benchmarks
Expected Performance (Pilot Results)
| Task Type | Iterations | Duration | Success Rate |
|---|---|---|---|
| Simple (typo fix) | 1-2 | 20-40min | 90%+ |
| Moderate (validation rule) | 2-4 | 1-2h | 70-80% |
| Complex (new feature) | 4-6 | 2-4h | 60-70% |
Bottlenecks
- Quality Gate (30-60s) — Sequential checks
- Architect Review (20-40s) — Opus processing time
- Executor Generation (15-30s) — Gemini code generation
Total per iteration: ~2-3 minutes overhead
Security Considerations
API Key Protection
- Store keys in environment variables (never in code)
- Use
.envfiles for local development - GitHub Secrets for CI/CD
Code Injection Prevention
- All LLM outputs validated by Quality Gate
- No
eval()or dynamic code execution - Git branch isolation prevents cross-contamination
PII/Secrets Detection
- Quality Gate includes secrets check (already in
.github/workflows/quality-gate.yml) - Auto-fix does not modify strings (no risk of exposing secrets)
Future Enhancements
Phase 2 (Post-Pilot)
- Parallel execution with git worktrees
- Web dashboard for real-time monitoring
- Automated prompt optimization based on failure patterns
- Integration with GitHub PR automation
- Slack notifications for failures
Phase 3 (Production Scale)
- Multi-repo support
- Custom task types (beyond Phase Q)
- A/B testing for prompt variations
- Cost optimization (switch to cheaper models for simple tasks)
FAQ
Q: Why Gemini instead of Opus for Executor? A: Gemini is 40x cheaper and 2x faster. Quality Gate catches hallucinations (8% rate), so speed > accuracy for code generation.
Q: Can I run tasks in parallel? A: No. Git branch isolation requires sequential execution. Parallel execution causes merge conflicts.
Q: What if pilot success rate is 50%? A: Follow rollback decision tree → Retry pilot with adjusted prompts, not immediate Phase Q deployment.
Q: How do I add new invariants?
A: Update CLAUDE.md → "Invariants to Preserve" section. Architect will automatically include in ADR validation.
Q: Can I use this for non-Phase Q tasks?
A: Yes, but adjust determine_max_iterations() logic and test on similar task types first.
Support & Contributing
Issues: Report via GitHub Issues
Questions: See docs/automation/TROUBLESHOOTING.md
Updates: Check git log for scripts/langgraph_workflow.py
License
Internal Jorvis project. Not for external distribution.
Last Updated: 2026-01-19 Next Review: After pilot completion (Feb 2026)