chore: add ralph sidebar workflow setup files

2026-02-16 11:33:13 +00:00
parent 78e994ec5e
commit 5a657c4aac
5 changed files with 874 additions and 0 deletions
@@ -0,0 +1,216 @@
+---
+name: ralph-setup
+description: Set up autonomous AI development tasks using the Ralph Wiggum technique. Use when the user wants to create a RALPH orchestration — either a simple looping prompt or a multi-hat coordinated workflow. Interviews the user to understand requirements, decides the appropriate mode, and generates all necessary configuration files (ralph.yml, hats.yml, PROMPT.md). Triggers on mentions of "ralph", "autonomous loop", "hat-based", "orchestration", or requests to set up iterative AI agent tasks.
+---
+
+# Ralph Setup Skill
+
+Set up autonomous AI development tasks using the Ralph Wiggum technique — either as a simple iterating prompt or a coordinated hat-based workflow.
+
+## Background
+
+Ralph implements the Ralph Wiggum technique: give an AI agent a task, loop it until it's done. The orchestrator is deliberately thin — it trusts the agent to do the work and enforces quality through backpressure (tests, lint, typecheck must pass).
+
+There are two modes:
+
+| Mode | What It Does | Best For |
+|------|-------------|----------|
+| **Traditional (Simple Prompt)** | Single loop — agent iterates until LOOP_COMPLETE | Quick tasks, single-concern work, anything one agent can handle in a straight line |
+| **Hat-Based** | Specialised personas coordinate through typed events | Complex workflows, multi-step processes, tasks needing distinct planning/building/reviewing phases |
+
+## Core Tenets (Apply to Both Modes)
+
+These six tenets guide every RALPH setup. Reference them when making decisions:
+
+1. **Fresh Context Is Reliability** — Each iteration clears context. The prompt must be self-contained enough to re-read, re-plan, and re-execute every cycle.
+2. **Backpressure Over Prescription** — Don't prescribe HOW to do the work. Create gates that reject bad work (tests pass, lint clean, types check).
+3. **The Plan Is Disposable** — Regeneration costs one planning loop. Cheap. Don't over-invest in preserving plans.
+4. **Disk Is State, Git Is Memory** — Files are the handoff mechanism between iterations. Git provides checkpointing and rollback.
+5. **Steer With Signals, Not Scripts** — Add signs (success criteria, quality gates), not step-by-step scripts.
+6. **Let Ralph Ralph** — Sit ON the loop, not IN it. The orchestrator coordinates; the agent does the work.
+
+## Workflow
+
+### Phase 1: Interview the User
+
+Before generating anything, you need to understand the task. Ask targeted questions to fill in these blanks:
+
+**Essential information:**
+- What is the task? (Be specific — "build an API" is too vague; "build a REST API for user management with Express.js and TypeScript" is good)
+- What does "done" look like? (Measurable success criteria — tests pass, endpoints respond, specific files exist)
+- What language/framework/tools are involved?
+- Does the project already exist, or is this greenfield?
+- Are there existing tests, linting, or type-checking set up?
+
+**Information that helps you decide the mode:**
+- How many distinct phases or concerns does this task have? (1-2 = simple prompt; 3+ = consider hats)
+- Does the task need planning before building? (If yes, hat-based is likely better)
+- Does the task need a review/QA step separate from building? (If yes, hat-based)
+- Is there a spec or design document to follow? (Spec-driven development suits hats well)
+- How complex is the codebase? (Large existing codebase with multiple modules = hat-based)
+
+**Don't over-interview.** If the user gives you a clear, well-scoped task, you may have enough after 1-2 questions. If the task is vague, probe until you can write a crisp PROMPT.md.
+
+### Phase 2: Decide the Mode
+
+Use this decision framework:
+
+**Choose Simple Prompt when:**
+- The task is a single concern (add a feature, fix a bug, write a script)
+- One agent can handle it start to finish without distinct phases
+- The success criteria are straightforward (tests pass, script runs)
+- The user explicitly wants something quick and simple
+- The task can be fully described in a PROMPT.md under ~50 lines
+
+**Choose Hat-Based when:**
+- The task has 3+ distinct phases (plan → build → test → review)
+- Different phases need different "mindsets" (architect vs implementer vs reviewer)
+- The task involves spec-driven development (spec → implement → verify)
+- There's a TDD workflow (write tests → implement → verify)
+- The task is large enough that a single prompt would be overwhelming
+- Multiple files/modules need coordinated changes
+- The user explicitly asks for hats or a structured workflow
+
+**When in doubt:** Start with Simple Prompt. You can always add hats later. Simpler is more robust.
+
+### Phase 3: Generate the Files
+
+Generate the appropriate files into the user's project directory. Always explain what you're creating and why.
+
+Read the appropriate reference file before generating:
+- For Simple Prompt: `references/simple-prompt-reference.md`
+- For Hat-Based: `references/hat-based-reference.md`
+
+#### Files to Generate
+
+**Both modes:**
+- `ralph.yml` — Main configuration
+- `PROMPT.md` — The task definition
+
+**Hat-Based mode additionally:**
+- `hats.yml` — Hat definitions with triggers, publishes, and instructions
+
+### Phase 4: Review with the User
+
+After generating the files, walk the user through what you created:
+- Summarise the task as you understood it
+- Explain the mode choice and why
+- Highlight the success criteria / completion promise
+- For hat-based: explain the event flow between hats
+- Ask if anything needs adjusting before they run it
+
+Then tell them how to run it:
+```bash
+# Simple prompt
+ralph run
+
+# Hat-based
+ralph run --config hats.yml
+
+# With iteration limit
+ralph run --max-iterations 50
+```
+
+## Writing Good Prompts (PROMPT.md)
+
+The PROMPT.md is the most important file. It must be:
+
+**Self-contained:** Every iteration starts fresh. The prompt must contain everything the agent needs to understand the task, check progress, and continue.
+
+**Outcome-focused:** Define WHAT, not HOW. Let the agent figure out the approach.
+
+**Measurable:** Include concrete success criteria the agent can verify:
+- "All tests pass" (not "write good tests")
+- "The /users endpoint returns 200 with valid JSON" (not "make the API work")
+- "TypeScript compiles with zero errors" (not "fix the types")
+
+**Structured but not prescriptive:** Use sections like Task, Requirements, Success Criteria, Constraints. Don't write step-by-step instructions.
+
+### Prompt Template (Simple)
+
+```markdown
+# Task: [Clear, specific title]
+
+[2-3 sentence description of what needs to be built/done]
+
+## Requirements
+
+- [Specific requirement 1]
+- [Specific requirement 2]
+- [Specific requirement 3]
+
+## Success Criteria
+
+All of the following must be true:
+- [ ] [Measurable criterion 1]
+- [ ] [Measurable criterion 2]
+- [ ] [Measurable criterion 3]
+
+## Constraints
+
+- [Technology constraints]
+- [Style/convention constraints]
+- [Performance constraints if any]
+
+## Status
+
+Track your progress here. Mark items complete as you go.
+When all success criteria are met, print LOOP_COMPLETE.
+```
+
+## Designing Hat Systems
+
+When creating hats, follow these principles:
+
+**Each hat should have a single responsibility.** Don't create a hat that plans AND builds.
+
+**Events flow forward.** The event chain should be a clear pipeline: task.start → plan.ready → build.done → review.complete → task.done.
+
+**Instructions should be specific to the hat's role.** The planner hat gets planning instructions, the builder gets building instructions.
+
+**Keep it minimal.** 2-4 hats is typical. More than 5 is usually overengineered.
+
+### Common Hat Patterns
+
+**Plan → Build (2 hats):**
+Good for tasks that need architectural thinking before coding.
+
+**Plan → Build → Review (3 hats):**
+Good for tasks that need quality assurance.
+
+**Spec → Implement → Verify (3 hats):**
+Good for spec-driven development.
+
+**Test → Implement → Verify (3 hats):**
+Good for TDD workflows.
+
+See `references/hat-based-reference.md` for full configuration examples.
+
+## Backpressure Configuration
+
+Backpressure gates reject incomplete work. Common gates:
+
+```yaml
+backpressure:
+  gates:
+    - name: "tests"
+      command: "npm test"
+      on_fail: "retry"
+    - name: "lint"
+      command: "npm run lint"
+      on_fail: "retry"
+    - name: "typecheck"
+      command: "npx tsc --noEmit"
+      on_fail: "retry"
+```
+
+Only add gates for tools that exist in the project. If there are no tests yet, don't add a test gate (unless the task IS to create tests).
+
+## Cost and Safety
+
+Always configure iteration limits. Remind the user:
+- Default max iterations: 100
+- Default max runtime: 4 hours
+- A 50-iteration cycle on a large codebase can cost $50-100+ in API credits
+- Recommend starting with `--max-iterations 30` for new setups and increasing if needed
+- Git checkpointing is on by default — the user can always roll back
@@ -0,0 +1,335 @@
+# Hat-Based Reference
+
+## Overview
+
+Hat-based mode uses specialised personas ("hats") that coordinate through typed events. Each hat triggers on specific events and publishes new events when done, creating a pipeline of distinct phases.
+
+Use this when the task genuinely benefits from separating concerns — e.g., planning separately from building, or reviewing separately from implementing.
+
+## hats.yml Structure
+
+```yaml
+cli:
+  backend: "claude"
+
+event_loop:
+  starting_event: "task.start"          # First event that kicks off the pipeline
+  completion_promise: "LOOP_COMPLETE"   # String that signals completion
+  max_iterations: 100                   # Safety limit
+
+hats:
+  hat_name:
+    name: "Human-Readable Name"
+    triggers: ["event.that.activates.this.hat"]
+    publishes: ["event.this.hat.emits.when.done"]
+    instructions: |
+      Detailed instructions for what this hat should do.
+      Must be self-contained — the hat gets fresh context each time.
+      Should reference PROMPT.md for the overall task.
+      Should specify what "done" means for this hat.
+```
+
+### Key Rules
+
+- **triggers**: List of events that activate this hat. A hat runs when ANY of its trigger events fire.
+- **publishes**: List of events this hat emits when it completes its work.
+- **instructions**: The prompt for this hat. Must be specific to the hat's role.
+- Events flow forward through the pipeline. Avoid circular event chains.
+- The last hat in the pipeline should print LOOP_COMPLETE when the overall task is done.
+
+## Common Patterns
+
+### Pattern 1: Plan → Build (2 Hats)
+
+Best for tasks that need architectural thinking before coding.
+
+```yaml
+cli:
+  backend: "claude"
+
+event_loop:
+  starting_event: "task.start"
+  completion_promise: "LOOP_COMPLETE"
+
+hats:
+  planner:
+    name: "Planner"
+    triggers: ["task.start"]
+    publishes: ["plan.ready"]
+    instructions: |
+      You are the Planner. Read PROMPT.md to understand the task.
+
+      Your job:
+      1. Analyse the requirements and existing codebase
+      2. Create a clear implementation plan in .ralph/plan.md
+      3. Break the work into concrete steps with file-level detail
+      4. Identify any risks or unknowns
+
+      Write the plan to .ralph/plan.md then emit plan.ready.
+
+      Do NOT write any code. Planning only.
+
+  builder:
+    name: "Builder"
+    triggers: ["plan.ready"]
+    publishes: ["task.done"]
+    instructions: |
+      You are the Builder. Read PROMPT.md for the task and .ralph/plan.md
+      for the implementation plan.
+
+      Your job:
+      1. Follow the plan step by step
+      2. Write clean, tested code
+      3. Run tests after each significant change
+      4. Update .ralph/plan.md to mark completed steps
+
+      When all success criteria from PROMPT.md are met and all tests pass,
+      print LOOP_COMPLETE.
+```
+
+### Pattern 2: Plan → Build → Review (3 Hats)
+
+Adds a review phase for quality assurance.
+
+```yaml
+cli:
+  backend: "claude"
+
+event_loop:
+  starting_event: "task.start"
+  completion_promise: "LOOP_COMPLETE"
+
+hats:
+  planner:
+    name: "Planner"
+    triggers: ["task.start", "review.changes_requested"]
+    publishes: ["plan.ready"]
+    instructions: |
+      You are the Planner. Read PROMPT.md to understand the task.
+
+      If triggered by review.changes_requested, read .ralph/review.md
+      for feedback and update the plan accordingly.
+
+      Create or update .ralph/plan.md with a clear implementation plan.
+      Emit plan.ready when done. Do NOT write code.
+
+  builder:
+    name: "Builder"
+    triggers: ["plan.ready"]
+    publishes: ["build.done"]
+    instructions: |
+      You are the Builder. Read PROMPT.md and .ralph/plan.md.
+
+      Implement the plan. Write tests. Run them.
+      When implementation is complete, emit build.done.
+
+      Do NOT assess overall quality — that's the Reviewer's job.
+
+  reviewer:
+    name: "Reviewer"
+    triggers: ["build.done"]
+    publishes: ["review.approved", "review.changes_requested"]
+    instructions: |
+      You are the Reviewer. Read PROMPT.md for requirements.
+
+      Review the current state of the codebase against the success criteria:
+      1. Do all tests pass?
+      2. Are all requirements met?
+      3. Is the code clean and following project conventions?
+      4. Are there edge cases not covered?
+
+      If everything passes, write your review to .ralph/review.md
+      and print LOOP_COMPLETE.
+
+      If changes are needed, write specific feedback to .ralph/review.md
+      and emit review.changes_requested.
+```
+
+### Pattern 3: Spec → Implement → Verify (3 Hats)
+
+For spec-driven development — good when working from a design document.
+
+```yaml
+cli:
+  backend: "claude"
+
+event_loop:
+  starting_event: "task.start"
+  completion_promise: "LOOP_COMPLETE"
+
+hats:
+  spec_writer:
+    name: "Spec Writer"
+    triggers: ["task.start", "verify.gaps_found"]
+    publishes: ["spec.ready"]
+    instructions: |
+      You are the Spec Writer. Read PROMPT.md for the high-level task.
+
+      If triggered by verify.gaps_found, read .ralph/verification.md
+      for gaps and update the spec to address them.
+
+      Write a detailed technical specification to .ralph/spec.md:
+      - API contracts (endpoints, request/response shapes)
+      - Data models
+      - Error handling behaviour
+      - Test scenarios
+
+      Emit spec.ready when done. Do NOT write implementation code.
+
+  implementer:
+    name: "Implementer"
+    triggers: ["spec.ready"]
+    publishes: ["implementation.done"]
+    instructions: |
+      You are the Implementer. Read .ralph/spec.md for the specification.
+
+      Implement exactly what the spec describes. Write tests that verify
+      each specification point. Run tests after each change.
+
+      Emit implementation.done when the spec is fully implemented.
+
+  verifier:
+    name: "Verifier"
+    triggers: ["implementation.done"]
+    publishes: ["verify.passed", "verify.gaps_found"]
+    instructions: |
+      You are the Verifier. Read .ralph/spec.md and PROMPT.md.
+
+      Verify that the implementation matches the spec:
+      1. Run all tests — they must pass
+      2. Check each spec point against the code
+      3. Verify success criteria from PROMPT.md
+
+      If everything checks out, print LOOP_COMPLETE.
+
+      If there are gaps, write them to .ralph/verification.md
+      and emit verify.gaps_found.
+```
+
+### Pattern 4: TDD — Test → Implement → Verify (3 Hats)
+
+For test-driven development workflows.
+
+```yaml
+cli:
+  backend: "claude"
+
+event_loop:
+  starting_event: "task.start"
+  completion_promise: "LOOP_COMPLETE"
+
+hats:
+  test_writer:
+    name: "Test Writer"
+    triggers: ["task.start", "verify.tests_needed"]
+    publishes: ["tests.ready"]
+    instructions: |
+      You are the Test Writer. Read PROMPT.md for requirements.
+
+      Write failing tests FIRST that describe the desired behaviour.
+      Tests should be comprehensive and cover edge cases.
+
+      If triggered by verify.tests_needed, read .ralph/verification.md
+      for the specific test gaps to fill.
+
+      Write tests, verify they fail (red phase), then emit tests.ready.
+      Do NOT write implementation code.
+
+  implementer:
+    name: "Implementer"
+    triggers: ["tests.ready"]
+    publishes: ["implementation.done"]
+    instructions: |
+      You are the Implementer. Your goal is to make the tests pass.
+
+      Read the test files to understand what behaviour is expected.
+      Write the minimum code to make all tests pass (green phase).
+
+      Run tests after each change. When all tests pass,
+      emit implementation.done.
+
+  verifier:
+    name: "Verifier"
+    triggers: ["implementation.done"]
+    publishes: ["verify.passed", "verify.tests_needed"]
+    instructions: |
+      You are the Verifier. Read PROMPT.md for the full requirements.
+
+      Check:
+      1. All tests pass
+      2. Test coverage is adequate for the requirements
+      3. All success criteria from PROMPT.md are met
+      4. Code is clean (refactor phase if needed)
+
+      If complete, print LOOP_COMPLETE.
+      If more tests are needed, write gaps to .ralph/verification.md
+      and emit verify.tests_needed.
+```
+
+## Backpressure with Hats
+
+Backpressure gates can be applied globally or per-hat:
+
+```yaml
+# Global backpressure — applies to all hats
+backpressure:
+  gates:
+    - name: "tests"
+      command: "npm test"
+      on_fail: "retry"
+    - name: "lint"
+      command: "npm run lint"
+      on_fail: "retry"
+
+# Per-hat backpressure
+hats:
+  builder:
+    triggers: ["plan.ready"]
+    publishes: ["build.done"]
+    backpressure:
+      gates:
+        - name: "typecheck"
+          command: "npx tsc --noEmit"
+          on_fail: "retry"
+    instructions: |
+      ...
+```
+
+## Memories
+
+Hats can use persistent memories stored in `.ralph/agent/memories.md`. These survive across iterations and sessions:
+
+```yaml
+hats:
+  builder:
+    memory:
+      path: ".ralph/agent/memories.md"
+      scope: "hat"       # or "global" to share across hats
+```
+
+Memories are useful for capturing lessons learned, recording decisions, and avoiding repeated mistakes.
+
+## Running Hat-Based Workflows
+
+```bash
+# Run with hats config
+ralph run --config hats.yml
+
+# With iteration limit
+ralph run --config hats.yml --max-iterations 50
+
+# Resume interrupted session
+ralph run --config hats.yml --continue
+```
+
+## Anti-Patterns
+
+**Too many hats.** If you have more than 5, you're probably overengineering. Each hat adds coordination overhead.
+
+**Circular event chains without an exit.** Every cycle must have a path to LOOP_COMPLETE. If planner → builder → reviewer → planner, the reviewer must sometimes emit completion instead of always cycling back.
+
+**Hats that duplicate work.** If the builder is also doing planning, your planner hat is wasted.
+
+**Overly prescriptive hat instructions.** The instructions should say WHAT to achieve, not HOW. Let the agent figure out the approach.
+
+**Missing the PROMPT.md reference.** Hat instructions should always tell the agent to read PROMPT.md for the overall task context. Without it, hats lose sight of the bigger picture.
@@ -0,0 +1,167 @@
+# Simple Prompt Reference
+
+## Overview
+
+Traditional mode is Ralph at its simplest: a single agent loops against a PROMPT.md until it outputs LOOP_COMPLETE or hits the iteration limit. No hats, no events — just a loop.
+
+This is the right choice for most tasks. Don't reach for hats unless you genuinely need distinct phases with different mindsets.
+
+## ralph.yml Configuration
+
+```yaml
+cli:
+  backend: "claude"      # or: kiro, gemini, codex, amp, copilot, opencode
+
+event_loop:
+  completion_promise: "LOOP_COMPLETE"
+  max_iterations: 50     # Start conservative, increase if needed
+```
+
+### Backend Options
+
+| Backend | CLI Tool | Notes |
+|---------|----------|-------|
+| claude | Claude Code | Recommended. Best reasoning, large context window |
+| kiro | Kiro | AWS-integrated |
+| gemini | Gemini CLI | Cost-effective |
+| codex | Codex | OpenAI agent |
+| amp | Amp | Sourcegraph agent |
+| copilot | Copilot CLI | GitHub integrated |
+| opencode | OpenCode | Open source |
+
+## PROMPT.md Examples
+
+### Example 1: Build a Feature
+
+```markdown
+# Task: Add User Authentication to Express API
+
+Add JWT-based authentication to the existing Express.js API.
+
+## Requirements
+
+- POST /auth/login accepts email + password, returns JWT
+- POST /auth/register creates a new user account
+- Middleware protects all /users/* routes
+- Tokens expire after 24 hours
+- Passwords are hashed with bcrypt
+
+## Success Criteria
+
+All of the following must be true:
+- [ ] POST /auth/register creates a user and returns 201
+- [ ] POST /auth/login returns a valid JWT for correct credentials
+- [ ] POST /auth/login returns 401 for incorrect credentials
+- [ ] Protected routes return 401 without a valid token
+- [ ] Protected routes work normally with a valid token
+- [ ] All existing tests still pass
+- [ ] New tests cover all auth endpoints
+- [ ] TypeScript compiles with zero errors
+
+## Constraints
+
+- Use jsonwebtoken for JWT handling
+- Use bcrypt for password hashing
+- Follow existing code patterns in src/
+- Do not modify existing endpoint behaviour
+
+## Status
+
+Track progress here. When all success criteria are met, print LOOP_COMPLETE.
+```
+
+### Example 2: Fix a Bug
+
+```markdown
+# Task: Fix Race Condition in WebSocket Handler
+
+The WebSocket message handler has a race condition where concurrent connections
+can corrupt shared state. Messages are being delivered to wrong clients.
+
+## Current Behaviour
+
+When 2+ clients send messages simultaneously, responses sometimes go to the
+wrong client. See issue #247 for reproduction steps.
+
+## Expected Behaviour
+
+Each client receives only their own responses, regardless of concurrency.
+
+## Success Criteria
+
+- [ ] Concurrent WebSocket test passes (test/ws-concurrent.test.ts)
+- [ ] Existing WebSocket tests still pass
+- [ ] No shared mutable state between connection handlers
+- [ ] Load test with 50 concurrent connections shows zero cross-talk
+
+## Constraints
+
+- Do not change the public WebSocket API
+- Fix must work with the existing Redis pub/sub setup
+
+## Status
+
+Track progress here. When all success criteria are met, print LOOP_COMPLETE.
+```
+
+### Example 3: Write a Script
+
+```markdown
+# Task: CSV Data Migration Script
+
+Create a Python script that migrates data from the legacy CSV format to the
+new database schema.
+
+## Requirements
+
+- Read CSV files from data/legacy/*.csv
+- Transform fields according to the mapping in docs/migration-map.md
+- Insert into PostgreSQL using the existing SQLAlchemy models
+- Handle duplicates by updating existing records
+- Log all skipped/failed rows to migration_errors.log
+
+## Success Criteria
+
+- [ ] Script processes all CSV files in data/legacy/
+- [ ] All valid rows are inserted or updated in the database
+- [ ] Duplicate handling works correctly (update, don't duplicate)
+- [ ] Error log captures all skipped rows with reasons
+- [ ] Script completes without unhandled exceptions
+- [ ] Unit tests cover the transformation logic
+
+## Constraints
+
+- Python 3.11+
+- Use existing SQLAlchemy models from src/models/
+- Must be idempotent (safe to run multiple times)
+
+## Status
+
+Track progress here. When all success criteria are met, print LOOP_COMPLETE.
+```
+
+## Running
+
+```bash
+# Basic run
+ralph run
+
+# With iteration limit
+ralph run --max-iterations 30
+
+# Resume an interrupted session
+ralph run --continue
+
+# Quiet mode (no TUI)
+ralph run -q
+```
+
+## When to Upgrade to Hats
+
+If you find the simple prompt struggling because:
+- The agent keeps flip-flopping between planning and coding
+- It loses track of the overall architecture while implementing details
+- It writes code but never stops to review/test properly
+- The task is too large for a single coherent prompt
+
+...then consider switching to hat-based mode. But try simplifying the prompt first — often the issue is a vague prompt, not a need for hats.