chore: add ralph sidebar workflow setup files

This commit is contained in:
2026-02-16 11:33:13 +00:00
parent 78e994ec5e
commit 5a657c4aac
5 changed files with 874 additions and 0 deletions
+216
View File
@@ -0,0 +1,216 @@
---
name: ralph-setup
description: Set up autonomous AI development tasks using the Ralph Wiggum technique. Use when the user wants to create a RALPH orchestration — either a simple looping prompt or a multi-hat coordinated workflow. Interviews the user to understand requirements, decides the appropriate mode, and generates all necessary configuration files (ralph.yml, hats.yml, PROMPT.md). Triggers on mentions of "ralph", "autonomous loop", "hat-based", "orchestration", or requests to set up iterative AI agent tasks.
---
# Ralph Setup Skill
Set up autonomous AI development tasks using the Ralph Wiggum technique — either as a simple iterating prompt or a coordinated hat-based workflow.
## Background
Ralph implements the Ralph Wiggum technique: give an AI agent a task, loop it until it's done. The orchestrator is deliberately thin — it trusts the agent to do the work and enforces quality through backpressure (tests, lint, typecheck must pass).
There are two modes:
| Mode | What It Does | Best For |
|------|-------------|----------|
| **Traditional (Simple Prompt)** | Single loop — agent iterates until LOOP_COMPLETE | Quick tasks, single-concern work, anything one agent can handle in a straight line |
| **Hat-Based** | Specialised personas coordinate through typed events | Complex workflows, multi-step processes, tasks needing distinct planning/building/reviewing phases |
## Core Tenets (Apply to Both Modes)
These six tenets guide every RALPH setup. Reference them when making decisions:
1. **Fresh Context Is Reliability** — Each iteration clears context. The prompt must be self-contained enough to re-read, re-plan, and re-execute every cycle.
2. **Backpressure Over Prescription** — Don't prescribe HOW to do the work. Create gates that reject bad work (tests pass, lint clean, types check).
3. **The Plan Is Disposable** — Regeneration costs one planning loop. Cheap. Don't over-invest in preserving plans.
4. **Disk Is State, Git Is Memory** — Files are the handoff mechanism between iterations. Git provides checkpointing and rollback.
5. **Steer With Signals, Not Scripts** — Add signs (success criteria, quality gates), not step-by-step scripts.
6. **Let Ralph Ralph** — Sit ON the loop, not IN it. The orchestrator coordinates; the agent does the work.
## Workflow
### Phase 1: Interview the User
Before generating anything, you need to understand the task. Ask targeted questions to fill in these blanks:
**Essential information:**
- What is the task? (Be specific — "build an API" is too vague; "build a REST API for user management with Express.js and TypeScript" is good)
- What does "done" look like? (Measurable success criteria — tests pass, endpoints respond, specific files exist)
- What language/framework/tools are involved?
- Does the project already exist, or is this greenfield?
- Are there existing tests, linting, or type-checking set up?
**Information that helps you decide the mode:**
- How many distinct phases or concerns does this task have? (1-2 = simple prompt; 3+ = consider hats)
- Does the task need planning before building? (If yes, hat-based is likely better)
- Does the task need a review/QA step separate from building? (If yes, hat-based)
- Is there a spec or design document to follow? (Spec-driven development suits hats well)
- How complex is the codebase? (Large existing codebase with multiple modules = hat-based)
**Don't over-interview.** If the user gives you a clear, well-scoped task, you may have enough after 1-2 questions. If the task is vague, probe until you can write a crisp PROMPT.md.
### Phase 2: Decide the Mode
Use this decision framework:
**Choose Simple Prompt when:**
- The task is a single concern (add a feature, fix a bug, write a script)
- One agent can handle it start to finish without distinct phases
- The success criteria are straightforward (tests pass, script runs)
- The user explicitly wants something quick and simple
- The task can be fully described in a PROMPT.md under ~50 lines
**Choose Hat-Based when:**
- The task has 3+ distinct phases (plan → build → test → review)
- Different phases need different "mindsets" (architect vs implementer vs reviewer)
- The task involves spec-driven development (spec → implement → verify)
- There's a TDD workflow (write tests → implement → verify)
- The task is large enough that a single prompt would be overwhelming
- Multiple files/modules need coordinated changes
- The user explicitly asks for hats or a structured workflow
**When in doubt:** Start with Simple Prompt. You can always add hats later. Simpler is more robust.
### Phase 3: Generate the Files
Generate the appropriate files into the user's project directory. Always explain what you're creating and why.
Read the appropriate reference file before generating:
- For Simple Prompt: `references/simple-prompt-reference.md`
- For Hat-Based: `references/hat-based-reference.md`
#### Files to Generate
**Both modes:**
- `ralph.yml` — Main configuration
- `PROMPT.md` — The task definition
**Hat-Based mode additionally:**
- `hats.yml` — Hat definitions with triggers, publishes, and instructions
### Phase 4: Review with the User
After generating the files, walk the user through what you created:
- Summarise the task as you understood it
- Explain the mode choice and why
- Highlight the success criteria / completion promise
- For hat-based: explain the event flow between hats
- Ask if anything needs adjusting before they run it
Then tell them how to run it:
```bash
# Simple prompt
ralph run
# Hat-based
ralph run --config hats.yml
# With iteration limit
ralph run --max-iterations 50
```
## Writing Good Prompts (PROMPT.md)
The PROMPT.md is the most important file. It must be:
**Self-contained:** Every iteration starts fresh. The prompt must contain everything the agent needs to understand the task, check progress, and continue.
**Outcome-focused:** Define WHAT, not HOW. Let the agent figure out the approach.
**Measurable:** Include concrete success criteria the agent can verify:
- "All tests pass" (not "write good tests")
- "The /users endpoint returns 200 with valid JSON" (not "make the API work")
- "TypeScript compiles with zero errors" (not "fix the types")
**Structured but not prescriptive:** Use sections like Task, Requirements, Success Criteria, Constraints. Don't write step-by-step instructions.
### Prompt Template (Simple)
```markdown
# Task: [Clear, specific title]
[2-3 sentence description of what needs to be built/done]
## Requirements
- [Specific requirement 1]
- [Specific requirement 2]
- [Specific requirement 3]
## Success Criteria
All of the following must be true:
- [ ] [Measurable criterion 1]
- [ ] [Measurable criterion 2]
- [ ] [Measurable criterion 3]
## Constraints
- [Technology constraints]
- [Style/convention constraints]
- [Performance constraints if any]
## Status
Track your progress here. Mark items complete as you go.
When all success criteria are met, print LOOP_COMPLETE.
```
## Designing Hat Systems
When creating hats, follow these principles:
**Each hat should have a single responsibility.** Don't create a hat that plans AND builds.
**Events flow forward.** The event chain should be a clear pipeline: task.start → plan.ready → build.done → review.complete → task.done.
**Instructions should be specific to the hat's role.** The planner hat gets planning instructions, the builder gets building instructions.
**Keep it minimal.** 2-4 hats is typical. More than 5 is usually overengineered.
### Common Hat Patterns
**Plan → Build (2 hats):**
Good for tasks that need architectural thinking before coding.
**Plan → Build → Review (3 hats):**
Good for tasks that need quality assurance.
**Spec → Implement → Verify (3 hats):**
Good for spec-driven development.
**Test → Implement → Verify (3 hats):**
Good for TDD workflows.
See `references/hat-based-reference.md` for full configuration examples.
## Backpressure Configuration
Backpressure gates reject incomplete work. Common gates:
```yaml
backpressure:
gates:
- name: "tests"
command: "npm test"
on_fail: "retry"
- name: "lint"
command: "npm run lint"
on_fail: "retry"
- name: "typecheck"
command: "npx tsc --noEmit"
on_fail: "retry"
```
Only add gates for tools that exist in the project. If there are no tests yet, don't add a test gate (unless the task IS to create tests).
## Cost and Safety
Always configure iteration limits. Remind the user:
- Default max iterations: 100
- Default max runtime: 4 hours
- A 50-iteration cycle on a large codebase can cost $50-100+ in API credits
- Recommend starting with `--max-iterations 30` for new setups and increasing if needed
- Git checkpointing is on by default — the user can always roll back
@@ -0,0 +1,335 @@
# Hat-Based Reference
## Overview
Hat-based mode uses specialised personas ("hats") that coordinate through typed events. Each hat triggers on specific events and publishes new events when done, creating a pipeline of distinct phases.
Use this when the task genuinely benefits from separating concerns — e.g., planning separately from building, or reviewing separately from implementing.
## hats.yml Structure
```yaml
cli:
backend: "claude"
event_loop:
starting_event: "task.start" # First event that kicks off the pipeline
completion_promise: "LOOP_COMPLETE" # String that signals completion
max_iterations: 100 # Safety limit
hats:
hat_name:
name: "Human-Readable Name"
triggers: ["event.that.activates.this.hat"]
publishes: ["event.this.hat.emits.when.done"]
instructions: |
Detailed instructions for what this hat should do.
Must be self-contained — the hat gets fresh context each time.
Should reference PROMPT.md for the overall task.
Should specify what "done" means for this hat.
```
### Key Rules
- **triggers**: List of events that activate this hat. A hat runs when ANY of its trigger events fire.
- **publishes**: List of events this hat emits when it completes its work.
- **instructions**: The prompt for this hat. Must be specific to the hat's role.
- Events flow forward through the pipeline. Avoid circular event chains.
- The last hat in the pipeline should print LOOP_COMPLETE when the overall task is done.
## Common Patterns
### Pattern 1: Plan → Build (2 Hats)
Best for tasks that need architectural thinking before coding.
```yaml
cli:
backend: "claude"
event_loop:
starting_event: "task.start"
completion_promise: "LOOP_COMPLETE"
hats:
planner:
name: "Planner"
triggers: ["task.start"]
publishes: ["plan.ready"]
instructions: |
You are the Planner. Read PROMPT.md to understand the task.
Your job:
1. Analyse the requirements and existing codebase
2. Create a clear implementation plan in .ralph/plan.md
3. Break the work into concrete steps with file-level detail
4. Identify any risks or unknowns
Write the plan to .ralph/plan.md then emit plan.ready.
Do NOT write any code. Planning only.
builder:
name: "Builder"
triggers: ["plan.ready"]
publishes: ["task.done"]
instructions: |
You are the Builder. Read PROMPT.md for the task and .ralph/plan.md
for the implementation plan.
Your job:
1. Follow the plan step by step
2. Write clean, tested code
3. Run tests after each significant change
4. Update .ralph/plan.md to mark completed steps
When all success criteria from PROMPT.md are met and all tests pass,
print LOOP_COMPLETE.
```
### Pattern 2: Plan → Build → Review (3 Hats)
Adds a review phase for quality assurance.
```yaml
cli:
backend: "claude"
event_loop:
starting_event: "task.start"
completion_promise: "LOOP_COMPLETE"
hats:
planner:
name: "Planner"
triggers: ["task.start", "review.changes_requested"]
publishes: ["plan.ready"]
instructions: |
You are the Planner. Read PROMPT.md to understand the task.
If triggered by review.changes_requested, read .ralph/review.md
for feedback and update the plan accordingly.
Create or update .ralph/plan.md with a clear implementation plan.
Emit plan.ready when done. Do NOT write code.
builder:
name: "Builder"
triggers: ["plan.ready"]
publishes: ["build.done"]
instructions: |
You are the Builder. Read PROMPT.md and .ralph/plan.md.
Implement the plan. Write tests. Run them.
When implementation is complete, emit build.done.
Do NOT assess overall quality — that's the Reviewer's job.
reviewer:
name: "Reviewer"
triggers: ["build.done"]
publishes: ["review.approved", "review.changes_requested"]
instructions: |
You are the Reviewer. Read PROMPT.md for requirements.
Review the current state of the codebase against the success criteria:
1. Do all tests pass?
2. Are all requirements met?
3. Is the code clean and following project conventions?
4. Are there edge cases not covered?
If everything passes, write your review to .ralph/review.md
and print LOOP_COMPLETE.
If changes are needed, write specific feedback to .ralph/review.md
and emit review.changes_requested.
```
### Pattern 3: Spec → Implement → Verify (3 Hats)
For spec-driven development — good when working from a design document.
```yaml
cli:
backend: "claude"
event_loop:
starting_event: "task.start"
completion_promise: "LOOP_COMPLETE"
hats:
spec_writer:
name: "Spec Writer"
triggers: ["task.start", "verify.gaps_found"]
publishes: ["spec.ready"]
instructions: |
You are the Spec Writer. Read PROMPT.md for the high-level task.
If triggered by verify.gaps_found, read .ralph/verification.md
for gaps and update the spec to address them.
Write a detailed technical specification to .ralph/spec.md:
- API contracts (endpoints, request/response shapes)
- Data models
- Error handling behaviour
- Test scenarios
Emit spec.ready when done. Do NOT write implementation code.
implementer:
name: "Implementer"
triggers: ["spec.ready"]
publishes: ["implementation.done"]
instructions: |
You are the Implementer. Read .ralph/spec.md for the specification.
Implement exactly what the spec describes. Write tests that verify
each specification point. Run tests after each change.
Emit implementation.done when the spec is fully implemented.
verifier:
name: "Verifier"
triggers: ["implementation.done"]
publishes: ["verify.passed", "verify.gaps_found"]
instructions: |
You are the Verifier. Read .ralph/spec.md and PROMPT.md.
Verify that the implementation matches the spec:
1. Run all tests — they must pass
2. Check each spec point against the code
3. Verify success criteria from PROMPT.md
If everything checks out, print LOOP_COMPLETE.
If there are gaps, write them to .ralph/verification.md
and emit verify.gaps_found.
```
### Pattern 4: TDD — Test → Implement → Verify (3 Hats)
For test-driven development workflows.
```yaml
cli:
backend: "claude"
event_loop:
starting_event: "task.start"
completion_promise: "LOOP_COMPLETE"
hats:
test_writer:
name: "Test Writer"
triggers: ["task.start", "verify.tests_needed"]
publishes: ["tests.ready"]
instructions: |
You are the Test Writer. Read PROMPT.md for requirements.
Write failing tests FIRST that describe the desired behaviour.
Tests should be comprehensive and cover edge cases.
If triggered by verify.tests_needed, read .ralph/verification.md
for the specific test gaps to fill.
Write tests, verify they fail (red phase), then emit tests.ready.
Do NOT write implementation code.
implementer:
name: "Implementer"
triggers: ["tests.ready"]
publishes: ["implementation.done"]
instructions: |
You are the Implementer. Your goal is to make the tests pass.
Read the test files to understand what behaviour is expected.
Write the minimum code to make all tests pass (green phase).
Run tests after each change. When all tests pass,
emit implementation.done.
verifier:
name: "Verifier"
triggers: ["implementation.done"]
publishes: ["verify.passed", "verify.tests_needed"]
instructions: |
You are the Verifier. Read PROMPT.md for the full requirements.
Check:
1. All tests pass
2. Test coverage is adequate for the requirements
3. All success criteria from PROMPT.md are met
4. Code is clean (refactor phase if needed)
If complete, print LOOP_COMPLETE.
If more tests are needed, write gaps to .ralph/verification.md
and emit verify.tests_needed.
```
## Backpressure with Hats
Backpressure gates can be applied globally or per-hat:
```yaml
# Global backpressure — applies to all hats
backpressure:
gates:
- name: "tests"
command: "npm test"
on_fail: "retry"
- name: "lint"
command: "npm run lint"
on_fail: "retry"
# Per-hat backpressure
hats:
builder:
triggers: ["plan.ready"]
publishes: ["build.done"]
backpressure:
gates:
- name: "typecheck"
command: "npx tsc --noEmit"
on_fail: "retry"
instructions: |
...
```
## Memories
Hats can use persistent memories stored in `.ralph/agent/memories.md`. These survive across iterations and sessions:
```yaml
hats:
builder:
memory:
path: ".ralph/agent/memories.md"
scope: "hat" # or "global" to share across hats
```
Memories are useful for capturing lessons learned, recording decisions, and avoiding repeated mistakes.
## Running Hat-Based Workflows
```bash
# Run with hats config
ralph run --config hats.yml
# With iteration limit
ralph run --config hats.yml --max-iterations 50
# Resume interrupted session
ralph run --config hats.yml --continue
```
## Anti-Patterns
**Too many hats.** If you have more than 5, you're probably overengineering. Each hat adds coordination overhead.
**Circular event chains without an exit.** Every cycle must have a path to LOOP_COMPLETE. If planner → builder → reviewer → planner, the reviewer must sometimes emit completion instead of always cycling back.
**Hats that duplicate work.** If the builder is also doing planning, your planner hat is wasted.
**Overly prescriptive hat instructions.** The instructions should say WHAT to achieve, not HOW. Let the agent figure out the approach.
**Missing the PROMPT.md reference.** Hat instructions should always tell the agent to read PROMPT.md for the overall task context. Without it, hats lose sight of the bigger picture.
@@ -0,0 +1,167 @@
# Simple Prompt Reference
## Overview
Traditional mode is Ralph at its simplest: a single agent loops against a PROMPT.md until it outputs LOOP_COMPLETE or hits the iteration limit. No hats, no events — just a loop.
This is the right choice for most tasks. Don't reach for hats unless you genuinely need distinct phases with different mindsets.
## ralph.yml Configuration
```yaml
cli:
backend: "claude" # or: kiro, gemini, codex, amp, copilot, opencode
event_loop:
completion_promise: "LOOP_COMPLETE"
max_iterations: 50 # Start conservative, increase if needed
```
### Backend Options
| Backend | CLI Tool | Notes |
|---------|----------|-------|
| claude | Claude Code | Recommended. Best reasoning, large context window |
| kiro | Kiro | AWS-integrated |
| gemini | Gemini CLI | Cost-effective |
| codex | Codex | OpenAI agent |
| amp | Amp | Sourcegraph agent |
| copilot | Copilot CLI | GitHub integrated |
| opencode | OpenCode | Open source |
## PROMPT.md Examples
### Example 1: Build a Feature
```markdown
# Task: Add User Authentication to Express API
Add JWT-based authentication to the existing Express.js API.
## Requirements
- POST /auth/login accepts email + password, returns JWT
- POST /auth/register creates a new user account
- Middleware protects all /users/* routes
- Tokens expire after 24 hours
- Passwords are hashed with bcrypt
## Success Criteria
All of the following must be true:
- [ ] POST /auth/register creates a user and returns 201
- [ ] POST /auth/login returns a valid JWT for correct credentials
- [ ] POST /auth/login returns 401 for incorrect credentials
- [ ] Protected routes return 401 without a valid token
- [ ] Protected routes work normally with a valid token
- [ ] All existing tests still pass
- [ ] New tests cover all auth endpoints
- [ ] TypeScript compiles with zero errors
## Constraints
- Use jsonwebtoken for JWT handling
- Use bcrypt for password hashing
- Follow existing code patterns in src/
- Do not modify existing endpoint behaviour
## Status
Track progress here. When all success criteria are met, print LOOP_COMPLETE.
```
### Example 2: Fix a Bug
```markdown
# Task: Fix Race Condition in WebSocket Handler
The WebSocket message handler has a race condition where concurrent connections
can corrupt shared state. Messages are being delivered to wrong clients.
## Current Behaviour
When 2+ clients send messages simultaneously, responses sometimes go to the
wrong client. See issue #247 for reproduction steps.
## Expected Behaviour
Each client receives only their own responses, regardless of concurrency.
## Success Criteria
- [ ] Concurrent WebSocket test passes (test/ws-concurrent.test.ts)
- [ ] Existing WebSocket tests still pass
- [ ] No shared mutable state between connection handlers
- [ ] Load test with 50 concurrent connections shows zero cross-talk
## Constraints
- Do not change the public WebSocket API
- Fix must work with the existing Redis pub/sub setup
## Status
Track progress here. When all success criteria are met, print LOOP_COMPLETE.
```
### Example 3: Write a Script
```markdown
# Task: CSV Data Migration Script
Create a Python script that migrates data from the legacy CSV format to the
new database schema.
## Requirements
- Read CSV files from data/legacy/*.csv
- Transform fields according to the mapping in docs/migration-map.md
- Insert into PostgreSQL using the existing SQLAlchemy models
- Handle duplicates by updating existing records
- Log all skipped/failed rows to migration_errors.log
## Success Criteria
- [ ] Script processes all CSV files in data/legacy/
- [ ] All valid rows are inserted or updated in the database
- [ ] Duplicate handling works correctly (update, don't duplicate)
- [ ] Error log captures all skipped rows with reasons
- [ ] Script completes without unhandled exceptions
- [ ] Unit tests cover the transformation logic
## Constraints
- Python 3.11+
- Use existing SQLAlchemy models from src/models/
- Must be idempotent (safe to run multiple times)
## Status
Track progress here. When all success criteria are met, print LOOP_COMPLETE.
```
## Running
```bash
# Basic run
ralph run
# With iteration limit
ralph run --max-iterations 30
# Resume an interrupted session
ralph run --continue
# Quiet mode (no TUI)
ralph run -q
```
## When to Upgrade to Hats
If you find the simple prompt struggling because:
- The agent keeps flip-flopping between planning and coding
- It loses track of the overall architecture while implementing details
- It writes code but never stops to review/test properly
- The task is too large for a single coherent prompt
...then consider switching to hat-based mode. But try simplifying the prompt first — often the issue is a vague prompt, not a need for hats.