Quickstart: 2-Agent Workflow for Phase Q Development

Version: 1.0 Date: 2026-01-19 Status: Production Ready (Blocked by Phase P)

Overview

The Phase Q automation system uses a 2-agent architecture:

Architect Agent (Claude Opus): Design ADRs, code review, error classification
Executor Agent (Gemini): Code generation, tests, documentation

Critical: Phase Q automation is blocked until Phase P (Task-102) is complete.

Prerequisites

1. Install Dependencies

# Python dependencies
pip install -r requirements-automation.txt

# Verify versions
python3 --version  # Python 3.11+
pip show langgraph langchain-anthropic langchain-google-genai

2. Setup API Keys

# Required keys
export ANTHROPIC_API_KEY="sk-ant-api03-..."  # Claude API key
export GOOGLE_API_KEY="AIza..."              # Gemini API key

# Verify keys
./scripts/verify_setup.sh

Important: System will not start without these keys.

3. Check Repository

# Check for uncommitted changes
./scripts/preflight.sh

# Check SSOT consistency
./scripts/detect-conflicts.sh

# Verify Phase P status
grep "Phase P" docs/handoff/HANDOFF_TO_NEXT_AGENT.md

Critical: If Phase P shows "BLOCKED", Phase Q automation cannot be run.

Workflow Modes

Mode 1: Single Task Execution (Recommended)

Use Case: Execute a single Phase Q task with full control.

# Execute single task (creates isolated git branch)
python3 scripts/langgraph_workflow.py Task-105

# Monitor progress
tail -f artifacts/task_105_result.json

# Check branch created
git branch | grep task-105

# Review generated code
git diff main..task-105-wikidata-sparql

Output:

Git branch: task-105-wikidata-sparql
Code: analytics-platform/src/mcp/tools/wikidata.tool.ts
Tests: analytics-platform/src/mcp/tools/wikidata.tool.spec.ts
Documentation: Updated in docs/
Result: artifacts/task_105_result.json

Next Steps:

Review ADR in branch
Run tests manually: npm --prefix analytics-platform test
Create PR if quality gate passed
Request Gatekeeper review

Mode 2: Multi-Task Pilot (Advanced)

Use Case: Batch execution of multiple Phase Q tasks (Q.2, Q.4-Q.7).

⚠️ Warning: Pilot mode creates 5+ git branches sequentially. Use only after successful Mode 1 test.

# Execute pilot (all automatable tasks)
python3 scripts/multi_task_runner.py

# Monitor progress
watch -n 5 'ls artifacts/task_*_result.json | wc -l'

# Check created branches
git branch | grep task-

Expected Branches:

task-105-wikidata-sparql
task-107-sportsdb-api
task-108-tmdb-movies
task-109-excel-upload
task-110-weather-data

Duration: ~2-4 hours (depends on API rate limits)

Rollback: System stops automatically if success rate <50%.

Workflow Steps (Detailed)

Step 1: Architect Node (ADR Creation)

Agent: Claude Opus Duration: 2-5 minutes Input: Task ID (e.g., Task-105) Output: ADR file in Markdown format

What Happens:

Reads PHASE_Q_IMPLEMENTATION_PLAN.md for context
Validates task against CLAUDE.md invariants
Designs implementation approach
Creates ADR with:
- Problem statement
- Proposed solution
- Implementation plan
- Testing strategy
- Acceptance criteria

Success Criteria:

ADR contains all required sections
No conflicts with existing architecture
Follows NestJS + TypeScript patterns
Coverage ≥80% target set

Example ADR:

# ADR: WikiData SPARQL MCP Tool (Task-105)

## Problem
Need to query WikiData knowledge graph via SPARQL.

## Solution
Implement MCP tool with SPARQL endpoint wrapper.

## Implementation
- Create `wikidata.tool.ts` in `analytics-platform/src/mcp/tools/`
- Use `@modelcontextprotocol/sdk` for MCP integration
- SPARQL queries via `https://query.wikidata.org/sparql`
- Result caching (5min TTL)

## Testing
- Unit tests: SPARQL query builder
- Integration tests: Real WikiData queries
- Coverage: ≥80%

## Acceptance Criteria
- [ ] Tool registered in MCP server
- [ ] Query execution <2s (p95)
- [ ] Error handling for malformed SPARQL
- [ ] Documentation in README.md

Step 2: Executor Node (Code Generation)

Agent: Gemini Duration: 3-7 minutes Input: ADR + Task context Output: TypeScript code + tests

What Happens:

Parses ADR requirements
Generates TypeScript code following style guide
Creates Jest unit tests (≥80% coverage)
Updates documentation
Commits to feature branch

Code Structure:

analytics-platform/src/mcp/tools/
├── wikidata.tool.ts          # MCP tool implementation
├── wikidata.tool.spec.ts     # Unit tests
└── README.md                 # Updated documentation

Quality Standards:

Google TypeScript Style Guide
JSDoc comments on all public functions
No any types without justification
Proper error handling
Input validation

Step 3: Quality Gate (Verification)

Automated Checks:

Lint: npm run lint (ESLint)
Build: npm run build (TypeScript compilation)
Test: npm test (Jest with coverage)
Coverage: ≥80% line coverage enforcement

Duration: 2-5 minutes

Pass Criteria:

✅ Lint: 0 errors
✅ Build: Successful compilation
✅ Tests: 100% passing
✅ Coverage: 85% (≥80% required)

Fail Handling:

Trivial errors (imports, unused vars): Auto-fix by Executor
Tactical errors (type mismatches, missing tests): Executor re-generates
Strategic errors (architectural issues): Escalate to Architect

Step 4: Review Loop (Error Classification)

Max Iterations: 3-7 (adaptive based on complexity)

Flow:

Quality Gate fails → Architect classifies error
Trivial: Executor auto-fixes (import sorting, formatting)
Tactical: Executor re-generates code section
Strategic: Architect redesigns approach (rare)
Quality Gate re-runs
Repeat until pass OR max iterations reached

Example Loop:

Iteration 1: Test failure (missing mock) → Tactical → Executor adds mock
Iteration 2: Coverage 75% → Tactical → Executor adds edge case tests
Iteration 3: Coverage 82% → Pass ✅

Success Rate: 70%+ expected (based on design)

Rollback Trigger: If ≥5 iterations without progress, task fails.

Step 5: Output & PR Creation

Automated Output:

Git branch: task-{ID}-{feature-name}
Code + tests committed
Result JSON: artifacts/task_{ID}_result.json

Manual Steps (Human Required):

Review ADR in branch

Test code locally:

git checkout task-105-wikidata-sparql
npm --prefix analytics-platform test -- wikidata.tool.spec.ts
npm --prefix analytics-platform run lint

Create PR:

gh pr create --title "feat(mcp): Add WikiData SPARQL tool (Task-105)" \
  --body "$(cat artifacts/task_105_adr.md)"

Request Gatekeeper review in docs/agent_ops/GO_NO_GO.md

Safety & Rollback

Git Branch Isolation

Why: Sequential task execution prevents conflicts.

Mechanism:

Each task creates isolated branch from main
Branches never merge automatically
Human reviews all PRs before merge

Example:

main
├── task-105-wikidata-sparql  (PR #201)
├── task-107-sportsdb-api     (PR #202)
└── task-108-tmdb-movies      (PR #203)

Merge Order: Sequential (105 → 107 → 108) to avoid conflicts.

Rollback Scenarios

Scenario 1: Single Task Fails Quality Gate

Symptom: artifacts/task_105_result.json shows "success": false

Action:

# Delete failed branch
git branch -D task-105-wikidata-sparql

# Re-run with increased iterations
python3 scripts/langgraph_workflow.py Task-105 --max-iterations 10

Scenario 2: Pilot Mode Success Rate <50%

Symptom: Only 2/5 tasks passed Quality Gate

Action:

# Check results
cat artifacts/pilot_run_summary.json

# Manual execution of failed tasks
python3 scripts/langgraph_workflow.py Task-107  # Manual retry

Escalation: If persistent failures, revert to manual development (no automation).

Scenario 3: Hallucination Detected (Gemini-Specific)

Symptom: Code references non-existent APIs or imports

Detection: Architect node validates imports against package.json

Action: Automatic re-generation with stricter prompt constraints.

Emergency Stop

# Kill running automation
pkill -f langgraph_workflow.py

# Check orphaned branches
git branch | grep task-

# Delete orphaned branches
git branch | grep task- | xargs git branch -D

Monitoring & Debugging

Real-Time Progress

# Watch automation logs
tail -f artifacts/automation.log

# Check task status
cat artifacts/task_105_result.json | jq '.status'

# Monitor API usage
cat artifacts/automation.log | grep "API call"

Common Issues

Issue 1: API Rate Limits

Symptom: RateLimitError: Anthropic API rate limit exceeded

Fix:

# Wait 60 seconds
sleep 60

# Retry with exponential backoff
python3 scripts/langgraph_workflow.py Task-105 --retry

Issue 2: Coverage <80%

Symptom: Quality Gate fails with "Coverage: 75%"

Fix: Executor automatically adds tests in next iteration (up to 7 iterations).

Manual Override: Edit code locally if automation fails after 7 iterations.

Issue 3: Git Conflicts

Symptom: git merge fails during branch creation

Fix:

# Rebase on latest main
git checkout task-105-wikidata-sparql
git rebase main

# Resolve conflicts manually
git mergetool

Success Criteria

Task Completion Checklist

Quality Gate passed (lint, build, test pass)
Coverage target ≥80% (enforced for new code, see Quality Gate)
Git branch created and pushed
ADR documentation in branch
Confidence score ≥30% (auto-blocks if <30%)
Result JSON shows "success": true
PR created with evidence
Gatekeeper review requested

Phase Q Completion Criteria

Documentation References

Architecture: docs/automation/ARCHITECTURE.md
Implementation Guide: docs/automation/IMPLEMENTATION_GUIDE.md
Decision Log: docs/automation/DECISION_LOG.md (10 ADRs)
Phase Q Plan: docs/agent_ops/plans/PHASE_Q_IMPLEMENTATION_PLAN.md
Enforcement Layers: AGENTS.md (Layer 6: Automation Workflow)

Support & Escalation

Automation Issues:

Check artifacts/automation.log for errors
Verify API keys: ./scripts/verify_setup.sh
Re-run with debug flag: python3 scripts/langgraph_workflow.py Task-105 --debug

Strategic Issues:

Escalate to Architect agent (human)
Review ADR for architectural conflicts
Update PHASE_Q_IMPLEMENTATION_PLAN.md if requirements changed

Emergency:

Stop automation: pkill -f langgraph_workflow.py
Rollback: git branch -D task-*
Manual execution: Follow AGENTS.md manual workflow

Quickstart maintained by Automation Team. Last updated: 2026-01-19