Quickstart: 2-Agent Workflow for Phase Q Development
Quickstart: 2-Agent Workflow for Phase Q Development
Version: 1.0 Date: 2026-01-19 Status: Production Ready (Blocked by Phase P)
Overview
The Phase Q automation system uses a 2-agent architecture:
- Architect Agent (Claude Opus): Design ADRs, code review, error classification
- Executor Agent (Gemini): Code generation, tests, documentation
Critical: Phase Q automation is blocked until Phase P (Task-102) is complete.
Prerequisites
1. Install Dependencies
# Python dependencies
pip install -r requirements-automation.txt
# Verify versions
python3 --version # Python 3.11+
pip show langgraph langchain-anthropic langchain-google-genai
2. Setup API Keys
# Required keys
export ANTHROPIC_API_KEY="sk-ant-api03-..." # Claude API key
export GOOGLE_API_KEY="AIza..." # Gemini API key
# Verify keys
./scripts/verify_setup.sh
Important: System will not start without these keys.
3. Check Repository
# Check for uncommitted changes
./scripts/preflight.sh
# Check SSOT consistency
./scripts/detect-conflicts.sh
# Verify Phase P status
grep "Phase P" docs/handoff/HANDOFF_TO_NEXT_AGENT.md
Critical: If Phase P shows "BLOCKED", Phase Q automation cannot be run.
Workflow Modes
Mode 1: Single Task Execution (Recommended)
Use Case: Execute a single Phase Q task with full control.
# Execute single task (creates isolated git branch)
python3 scripts/langgraph_workflow.py Task-105
# Monitor progress
tail -f artifacts/task_105_result.json
# Check branch created
git branch | grep task-105
# Review generated code
git diff main..task-105-wikidata-sparql
Output:
- Git branch:
task-105-wikidata-sparql - Code:
analytics-platform/src/mcp/tools/wikidata.tool.ts - Tests:
analytics-platform/src/mcp/tools/wikidata.tool.spec.ts - Documentation: Updated in
docs/ - Result:
artifacts/task_105_result.json
Next Steps:
- Review ADR in branch
- Run tests manually:
npm --prefix analytics-platform test - Create PR if quality gate passed
- Request Gatekeeper review
Mode 2: Multi-Task Pilot (Advanced)
Use Case: Batch execution of multiple Phase Q tasks (Q.2, Q.4-Q.7).
⚠️ Warning: Pilot mode creates 5+ git branches sequentially. Use only after successful Mode 1 test.
# Execute pilot (all automatable tasks)
python3 scripts/multi_task_runner.py
# Monitor progress
watch -n 5 'ls artifacts/task_*_result.json | wc -l'
# Check created branches
git branch | grep task-
Expected Branches:
task-105-wikidata-sparqltask-107-sportsdb-apitask-108-tmdb-moviestask-109-excel-uploadtask-110-weather-data
Duration: ~2-4 hours (depends on API rate limits)
Rollback: System stops automatically if success rate <50%.
Workflow Steps (Detailed)
Step 1: Architect Node (ADR Creation)
Agent: Claude Opus
Duration: 2-5 minutes
Input: Task ID (e.g., Task-105)
Output: ADR file in Markdown format
What Happens:
- Reads
PHASE_Q_IMPLEMENTATION_PLAN.mdfor context - Validates task against
CLAUDE.mdinvariants - Designs implementation approach
- Creates ADR with:
- Problem statement
- Proposed solution
- Implementation plan
- Testing strategy
- Acceptance criteria
Success Criteria:
- ADR contains all required sections
- No conflicts with existing architecture
- Follows NestJS + TypeScript patterns
- Coverage ≥80% target set
Example ADR:
# ADR: WikiData SPARQL MCP Tool (Task-105)
## Problem
Need to query WikiData knowledge graph via SPARQL.
## Solution
Implement MCP tool with SPARQL endpoint wrapper.
## Implementation
- Create `wikidata.tool.ts` in `analytics-platform/src/mcp/tools/`
- Use `@modelcontextprotocol/sdk` for MCP integration
- SPARQL queries via `https://query.wikidata.org/sparql`
- Result caching (5min TTL)
## Testing
- Unit tests: SPARQL query builder
- Integration tests: Real WikiData queries
- Coverage: ≥80%
## Acceptance Criteria
- [ ] Tool registered in MCP server
- [ ] Query execution <2s (p95)
- [ ] Error handling for malformed SPARQL
- [ ] Documentation in README.md
Step 2: Executor Node (Code Generation)
Agent: Gemini Duration: 3-7 minutes Input: ADR + Task context Output: TypeScript code + tests
What Happens:
- Parses ADR requirements
- Generates TypeScript code following style guide
- Creates Jest unit tests (≥80% coverage)
- Updates documentation
- Commits to feature branch
Code Structure:
analytics-platform/src/mcp/tools/
├── wikidata.tool.ts # MCP tool implementation
├── wikidata.tool.spec.ts # Unit tests
└── README.md # Updated documentation
Quality Standards:
- Google TypeScript Style Guide
- JSDoc comments on all public functions
- No
anytypes without justification - Proper error handling
- Input validation
Step 3: Quality Gate (Verification)
Automated Checks:
- Lint:
npm run lint(ESLint) - Build:
npm run build(TypeScript compilation) - Test:
npm test(Jest with coverage) - Coverage: ≥80% line coverage enforcement
Duration: 2-5 minutes
Pass Criteria:
✅ Lint: 0 errors
✅ Build: Successful compilation
✅ Tests: 100% passing
✅ Coverage: 85% (≥80% required)
Fail Handling:
- Trivial errors (imports, unused vars): Auto-fix by Executor
- Tactical errors (type mismatches, missing tests): Executor re-generates
- Strategic errors (architectural issues): Escalate to Architect
Step 4: Review Loop (Error Classification)
Max Iterations: 3-7 (adaptive based on complexity)
Flow:
- Quality Gate fails → Architect classifies error
- Trivial: Executor auto-fixes (import sorting, formatting)
- Tactical: Executor re-generates code section
- Strategic: Architect redesigns approach (rare)
- Quality Gate re-runs
- Repeat until pass OR max iterations reached
Example Loop:
Iteration 1: Test failure (missing mock) → Tactical → Executor adds mock
Iteration 2: Coverage 75% → Tactical → Executor adds edge case tests
Iteration 3: Coverage 82% → Pass ✅
Success Rate: 70%+ expected (based on design)
Rollback Trigger: If ≥5 iterations without progress, task fails.
Step 5: Output & PR Creation
Automated Output:
- Git branch:
task-{ID}-{feature-name} - Code + tests committed
- Result JSON:
artifacts/task_{ID}_result.json
Manual Steps (Human Required):
-
Review ADR in branch
-
Test code locally:
git checkout task-105-wikidata-sparql npm --prefix analytics-platform test -- wikidata.tool.spec.ts npm --prefix analytics-platform run lint -
Create PR:
gh pr create --title "feat(mcp): Add WikiData SPARQL tool (Task-105)" \ --body "$(cat artifacts/task_105_adr.md)" -
Request Gatekeeper review in
docs/agent_ops/GO_NO_GO.md
Safety & Rollback
Git Branch Isolation
Why: Sequential task execution prevents conflicts.
Mechanism:
- Each task creates isolated branch from
main - Branches never merge automatically
- Human reviews all PRs before merge
Example:
main
├── task-105-wikidata-sparql (PR #201)
├── task-107-sportsdb-api (PR #202)
└── task-108-tmdb-movies (PR #203)
Merge Order: Sequential (105 → 107 → 108) to avoid conflicts.
Rollback Scenarios
Scenario 1: Single Task Fails Quality Gate
Symptom: artifacts/task_105_result.json shows "success": false
Action:
# Delete failed branch
git branch -D task-105-wikidata-sparql
# Re-run with increased iterations
python3 scripts/langgraph_workflow.py Task-105 --max-iterations 10
Scenario 2: Pilot Mode Success Rate <50%
Symptom: Only 2/5 tasks passed Quality Gate
Action:
# Check results
cat artifacts/pilot_run_summary.json
# Manual execution of failed tasks
python3 scripts/langgraph_workflow.py Task-107 # Manual retry
Escalation: If persistent failures, revert to manual development (no automation).
Scenario 3: Hallucination Detected (Gemini-Specific)
Symptom: Code references non-existent APIs or imports
Detection: Architect node validates imports against package.json
Action: Automatic re-generation with stricter prompt constraints.
Emergency Stop
# Kill running automation
pkill -f langgraph_workflow.py
# Check orphaned branches
git branch | grep task-
# Delete orphaned branches
git branch | grep task- | xargs git branch -D
Monitoring & Debugging
Real-Time Progress
# Watch automation logs
tail -f artifacts/automation.log
# Check task status
cat artifacts/task_105_result.json | jq '.status'
# Monitor API usage
cat artifacts/automation.log | grep "API call"
Common Issues
Issue 1: API Rate Limits
Symptom: RateLimitError: Anthropic API rate limit exceeded
Fix:
# Wait 60 seconds
sleep 60
# Retry with exponential backoff
python3 scripts/langgraph_workflow.py Task-105 --retry
Issue 2: Coverage <80%
Symptom: Quality Gate fails with "Coverage: 75%"
Fix: Executor automatically adds tests in next iteration (up to 7 iterations).
Manual Override: Edit code locally if automation fails after 7 iterations.
Issue 3: Git Conflicts
Symptom: git merge fails during branch creation
Fix:
# Rebase on latest main
git checkout task-105-wikidata-sparql
git rebase main
# Resolve conflicts manually
git mergetool
Success Criteria
Task Completion Checklist
- Quality Gate passed (lint, build, test pass)
- Coverage target ≥80% (enforced for new code, see Quality Gate)
- Git branch created and pushed
- ADR documentation in branch
- Confidence score ≥30% (auto-blocks if <30%)
- Result JSON shows
"success": true - PR created with evidence
- Gatekeeper review requested
Phase Q Completion Criteria
- 8/8 tasks completed (Q.1-Q.8)
- All PRs merged to
main - Integration tests passing
- Documentation updated
- Demo scenarios validated
Documentation References
- Architecture:
docs/automation/ARCHITECTURE.md - Implementation Guide:
docs/automation/IMPLEMENTATION_GUIDE.md - Decision Log:
docs/automation/DECISION_LOG.md(10 ADRs) - Phase Q Plan:
docs/agent_ops/plans/PHASE_Q_IMPLEMENTATION_PLAN.md - Enforcement Layers:
AGENTS.md(Layer 6: Automation Workflow)
Support & Escalation
Automation Issues:
- Check
artifacts/automation.logfor errors - Verify API keys:
./scripts/verify_setup.sh - Re-run with debug flag:
python3 scripts/langgraph_workflow.py Task-105 --debug
Strategic Issues:
- Escalate to Architect agent (human)
- Review ADR for architectural conflicts
- Update
PHASE_Q_IMPLEMENTATION_PLAN.mdif requirements changed
Emergency:
- Stop automation:
pkill -f langgraph_workflow.py - Rollback:
git branch -D task-* - Manual execution: Follow
AGENTS.mdmanual workflow
Quickstart maintained by Automation Team. Last updated: 2026-01-19