portfolio/.codex/skills/skills/ralph-setup/SKILL.md

---
name: ralph-setup
description: Set up autonomous AI development tasks using the Ralph Wiggum technique. Use when the user wants to create a RALPH orchestration — either a simple looping prompt or a multi-hat coordinated workflow. Interviews the user to understand requirements, decides the appropriate mode, and generates all necessary configuration files (ralph.yml, hats.yml, PROMPT.md). Triggers on mentions of "ralph", "autonomous loop", "hat-based", "orchestration", or requests to set up iterative AI agent tasks.
---

# Ralph Setup Skill

Set up autonomous AI development tasks using the Ralph Wiggum technique — either as a simple iterating prompt or a coordinated hat-based workflow.

## Background

Ralph implements the Ralph Wiggum technique: give an AI agent a task, loop it until it's done. The orchestrator is deliberately thin — it trusts the agent to do the work and enforces quality through backpressure (tests, lint, typecheck must pass).

There are two modes:

| Mode | What It Does | Best For |
|------|-------------|----------|
| **Traditional (Simple Prompt)** | Single loop — agent iterates until LOOP_COMPLETE | Quick tasks, single-concern work, anything one agent can handle in a straight line |
| **Hat-Based** | Specialised personas coordinate through typed events | Complex workflows, multi-step processes, tasks needing distinct planning/building/reviewing phases |

## Core Tenets (Apply to Both Modes)

These six tenets guide every RALPH setup. Reference them when making decisions:

1. **Fresh Context Is Reliability** — Each iteration clears context. The prompt must be self-contained enough to re-read, re-plan, and re-execute every cycle.
2. **Backpressure Over Prescription** — Don't prescribe HOW to do the work. Create gates that reject bad work (tests pass, lint clean, types check).
3. **The Plan Is Disposable** — Regeneration costs one planning loop. Cheap. Don't over-invest in preserving plans.
4. **Disk Is State, Git Is Memory** — Files are the handoff mechanism between iterations. Git provides checkpointing and rollback.
5. **Steer With Signals, Not Scripts** — Add signs (success criteria, quality gates), not step-by-step scripts.
6. **Let Ralph Ralph** — Sit ON the loop, not IN it. The orchestrator coordinates; the agent does the work.

## Workflow

### Phase 1: Interview the User

Before generating anything, you need to understand the task. Ask targeted questions to fill in these blanks:

**Essential information:**
- What is the task? (Be specific — "build an API" is too vague; "build a REST API for user management with Express.js and TypeScript" is good)
- What does "done" look like? (Measurable success criteria — tests pass, endpoints respond, specific files exist)
- What language/framework/tools are involved?
- Does the project already exist, or is this greenfield?
- Are there existing tests, linting, or type-checking set up?

**Information that helps you decide the mode:**
- How many distinct phases or concerns does this task have? (1-2 = simple prompt; 3+ = consider hats)
- Does the task need planning before building? (If yes, hat-based is likely better)
- Does the task need a review/QA step separate from building? (If yes, hat-based)
- Is there a spec or design document to follow? (Spec-driven development suits hats well)
- How complex is the codebase? (Large existing codebase with multiple modules = hat-based)

**Don't over-interview.** If the user gives you a clear, well-scoped task, you may have enough after 1-2 questions. If the task is vague, probe until you can write a crisp PROMPT.md.

### Phase 2: Decide the Mode

Use this decision framework:

**Choose Simple Prompt when:**
- The task is a single concern (add a feature, fix a bug, write a script)
- One agent can handle it start to finish without distinct phases
- The success criteria are straightforward (tests pass, script runs)
- The user explicitly wants something quick and simple
- The task can be fully described in a PROMPT.md under ~50 lines

**Choose Hat-Based when:**
- The task has 3+ distinct phases (plan → build → test → review)
- Different phases need different "mindsets" (architect vs implementer vs reviewer)
- The task involves spec-driven development (spec → implement → verify)
- There's a TDD workflow (write tests → implement → verify)
- The task is large enough that a single prompt would be overwhelming
- Multiple files/modules need coordinated changes
- The user explicitly asks for hats or a structured workflow

**When in doubt:** Start with Simple Prompt. You can always add hats later. Simpler is more robust.

### Phase 3: Generate the Files

Generate the appropriate files into the user's project directory. Always explain what you're creating and why.

Read the appropriate reference file before generating:
- For Simple Prompt: `references/simple-prompt-reference.md`
- For Hat-Based: `references/hat-based-reference.md`

#### Files to Generate

**Both modes:**
- `ralph.yml` — Main configuration
- `PROMPT.md` — The task definition

**Hat-Based mode additionally:**
- `hats.yml` — Hat definitions with triggers, publishes, and instructions

### Phase 4: Review with the User

After generating the files, walk the user through what you created:
- Summarise the task as you understood it
- Explain the mode choice and why
- Highlight the success criteria / completion promise
- For hat-based: explain the event flow between hats
- Ask if anything needs adjusting before they run it

Then tell them how to run it:
```bash
# Simple prompt
ralph run

# Hat-based
ralph run --config hats.yml

# With iteration limit
ralph run --max-iterations 50
```

## Writing Good Prompts (PROMPT.md)

The PROMPT.md is the most important file. It must be:

**Self-contained:** Every iteration starts fresh. The prompt must contain everything the agent needs to understand the task, check progress, and continue.

**Outcome-focused:** Define WHAT, not HOW. Let the agent figure out the approach.

**Measurable:** Include concrete success criteria the agent can verify:
- "All tests pass" (not "write good tests")
- "The /users endpoint returns 200 with valid JSON" (not "make the API work")
- "TypeScript compiles with zero errors" (not "fix the types")

**Structured but not prescriptive:** Use sections like Task, Requirements, Success Criteria, Constraints. Don't write step-by-step instructions.

### Prompt Template (Simple)

```markdown
# Task: [Clear, specific title]

[2-3 sentence description of what needs to be built/done]

## Requirements

- [Specific requirement 1]
- [Specific requirement 2]
- [Specific requirement 3]

## Success Criteria

All of the following must be true:
- [ ] [Measurable criterion 1]
- [ ] [Measurable criterion 2]
- [ ] [Measurable criterion 3]

## Constraints

- [Technology constraints]
- [Style/convention constraints]
- [Performance constraints if any]

## Status

Track your progress here. Mark items complete as you go.
When all success criteria are met, print LOOP_COMPLETE.
```

## Designing Hat Systems

When creating hats, follow these principles:

**Each hat should have a single responsibility.** Don't create a hat that plans AND builds.

**Events flow forward.** The event chain should be a clear pipeline: work.start → plan.ready → build.done → review (changes requested OR LOOP_COMPLETE).

**Terminal hats should end, not publish success.** For the final validation/review hat, success should be `LOOP_COMPLETE` (no success event like `review.approved`), and only rework/failure events should be published.

**Instructions should be specific to the hat's role.** The planner hat gets planning instructions, the builder gets building instructions.

**Keep it minimal.** 2-4 hats is typical. More than 5 is usually overengineered.

### Common Hat Patterns

**Plan → Build (2 hats):**
Good for tasks that need architectural thinking before coding.

**Plan → Build → Review (3 hats):**
Good for tasks that need quality assurance.

**Spec → Implement → Verify (3 hats):**
Good for spec-driven development.

**Test → Implement → Verify (3 hats):**
Good for TDD workflows.

See `references/hat-based-reference.md` for full configuration examples.

## Backpressure Configuration

Backpressure gates reject incomplete work. Common gates:

```yaml
backpressure:
  gates:
    - name: "tests"
      command: "npm test"
      on_fail: "retry"
    - name: "lint"
      command: "npm run lint"
      on_fail: "retry"
    - name: "typecheck"
      command: "npx tsc --noEmit"
      on_fail: "retry"
```

Only add gates for tools that exist in the project. If there are no tests yet, don't add a test gate (unless the task IS to create tests).

### No-Skip Safety Rules

When configuring backpressure and completion logic, preserve quality standards:

- Never treat a circuit breaker as an automatic pass.
- Never skip required checks that are configured in the repository.
- Always require an explicit review outcome before completion (`LOOP_COMPLETE` or concrete changes requested).
- If tests exist in the project and are part of quality gates, they must run and pass before completion.
- If a gate is not configured in the repo, mark it `not-configured` explicitly rather than fabricating retries.

### Loop Circuit Breaker and Escalation

To prevent infinite review/backpressure churn, include a circuit breaker policy in generated prompts/hats:

- Detect repeated identical evidence cycles (same blocker class and materially identical build evidence) across 2-3 consecutive iterations.
- If repetition threshold is reached, stop retrying the same recovery path.
- Escalate instead of auto-completing:
  - record the blocker and evidence in `.ralph/review.md`
  - assign an owner and target finish date
  - set status to require human decision/clarification
- Resume the loop only after the blocker criteria are clarified or configuration is corrected.

### Operational Hygiene Between Runs

Treat runtime coordination state as loop-scoped:

- Do not carry stale "recovery" tasks into a new objective unless explicitly intended.
- Avoid creating new meta/recovery tasks when all implementation tasks are already closed and no new actionable finding exists.
- Keep artifacts (`.ralph/plan.md`, `.ralph/review.md`, event logs) for auditability, but ensure open task queues reflect only current-loop actionable work.
- Prefer one clear escalation handoff over repeated coordination retries with identical payloads.

## Cost and Safety

Always configure iteration limits. Remind the user:
- Default max iterations: 100
- Default max runtime: 4 hours
- A 50-iteration cycle on a large codebase can cost $50-100+ in API credits
- Recommend starting with `--max-iterations 30` for new setups and increasing if needed
- Git checkpointing is on by default — the user can always roll back