Files
portfolio/.codex/skills/skills/ralph-setup/SKILL.md
T
2026-02-16 13:23:04 +00:00

11 KiB

name, description
name description
ralph-setup Set up autonomous AI development tasks using the Ralph Wiggum technique. Use when the user wants to create a RALPH orchestration — either a simple looping prompt or a multi-hat coordinated workflow. Interviews the user to understand requirements, decides the appropriate mode, and generates all necessary configuration files (ralph.yml, hats.yml, PROMPT.md). Triggers on mentions of "ralph", "autonomous loop", "hat-based", "orchestration", or requests to set up iterative AI agent tasks.

Ralph Setup Skill

Set up autonomous AI development tasks using the Ralph Wiggum technique — either as a simple iterating prompt or a coordinated hat-based workflow.

Background

Ralph implements the Ralph Wiggum technique: give an AI agent a task, loop it until it's done. The orchestrator is deliberately thin — it trusts the agent to do the work and enforces quality through backpressure (tests, lint, typecheck must pass).

There are two modes:

Mode What It Does Best For
Traditional (Simple Prompt) Single loop — agent iterates until LOOP_COMPLETE Quick tasks, single-concern work, anything one agent can handle in a straight line
Hat-Based Specialised personas coordinate through typed events Complex workflows, multi-step processes, tasks needing distinct planning/building/reviewing phases

Core Tenets (Apply to Both Modes)

These six tenets guide every RALPH setup. Reference them when making decisions:

  1. Fresh Context Is Reliability — Each iteration clears context. The prompt must be self-contained enough to re-read, re-plan, and re-execute every cycle.
  2. Backpressure Over Prescription — Don't prescribe HOW to do the work. Create gates that reject bad work (tests pass, lint clean, types check).
  3. The Plan Is Disposable — Regeneration costs one planning loop. Cheap. Don't over-invest in preserving plans.
  4. Disk Is State, Git Is Memory — Files are the handoff mechanism between iterations. Git provides checkpointing and rollback.
  5. Steer With Signals, Not Scripts — Add signs (success criteria, quality gates), not step-by-step scripts.
  6. Let Ralph Ralph — Sit ON the loop, not IN it. The orchestrator coordinates; the agent does the work.

Workflow

Phase 1: Interview the User

Before generating anything, you need to understand the task. Ask targeted questions to fill in these blanks:

Essential information:

  • What is the task? (Be specific — "build an API" is too vague; "build a REST API for user management with Express.js and TypeScript" is good)
  • What does "done" look like? (Measurable success criteria — tests pass, endpoints respond, specific files exist)
  • What language/framework/tools are involved?
  • Does the project already exist, or is this greenfield?
  • Are there existing tests, linting, or type-checking set up?

Information that helps you decide the mode:

  • How many distinct phases or concerns does this task have? (1-2 = simple prompt; 3+ = consider hats)
  • Does the task need planning before building? (If yes, hat-based is likely better)
  • Does the task need a review/QA step separate from building? (If yes, hat-based)
  • Is there a spec or design document to follow? (Spec-driven development suits hats well)
  • How complex is the codebase? (Large existing codebase with multiple modules = hat-based)

Don't over-interview. If the user gives you a clear, well-scoped task, you may have enough after 1-2 questions. If the task is vague, probe until you can write a crisp PROMPT.md.

Phase 2: Decide the Mode

Use this decision framework:

Choose Simple Prompt when:

  • The task is a single concern (add a feature, fix a bug, write a script)
  • One agent can handle it start to finish without distinct phases
  • The success criteria are straightforward (tests pass, script runs)
  • The user explicitly wants something quick and simple
  • The task can be fully described in a PROMPT.md under ~50 lines

Choose Hat-Based when:

  • The task has 3+ distinct phases (plan → build → test → review)
  • Different phases need different "mindsets" (architect vs implementer vs reviewer)
  • The task involves spec-driven development (spec → implement → verify)
  • There's a TDD workflow (write tests → implement → verify)
  • The task is large enough that a single prompt would be overwhelming
  • Multiple files/modules need coordinated changes
  • The user explicitly asks for hats or a structured workflow

When in doubt: Start with Simple Prompt. You can always add hats later. Simpler is more robust.

Phase 3: Generate the Files

Generate the appropriate files into the user's project directory. Always explain what you're creating and why.

Read the appropriate reference file before generating:

  • For Simple Prompt: references/simple-prompt-reference.md
  • For Hat-Based: references/hat-based-reference.md

Files to Generate

Both modes:

  • ralph.yml — Main configuration
  • PROMPT.md — The task definition

Hat-Based mode additionally:

  • hats.yml — Hat definitions with triggers, publishes, and instructions

Phase 4: Review with the User

After generating the files, walk the user through what you created:

  • Summarise the task as you understood it
  • Explain the mode choice and why
  • Highlight the success criteria / completion promise
  • For hat-based: explain the event flow between hats
  • Ask if anything needs adjusting before they run it

Then tell them how to run it:

# Simple prompt
ralph run

# Hat-based
ralph run --config hats.yml

# With iteration limit
ralph run --max-iterations 50

Writing Good Prompts (PROMPT.md)

The PROMPT.md is the most important file. It must be:

Self-contained: Every iteration starts fresh. The prompt must contain everything the agent needs to understand the task, check progress, and continue.

Outcome-focused: Define WHAT, not HOW. Let the agent figure out the approach.

Measurable: Include concrete success criteria the agent can verify:

  • "All tests pass" (not "write good tests")
  • "The /users endpoint returns 200 with valid JSON" (not "make the API work")
  • "TypeScript compiles with zero errors" (not "fix the types")

Structured but not prescriptive: Use sections like Task, Requirements, Success Criteria, Constraints. Don't write step-by-step instructions.

Prompt Template (Simple)

# Task: [Clear, specific title]

[2-3 sentence description of what needs to be built/done]

## Requirements

- [Specific requirement 1]
- [Specific requirement 2]
- [Specific requirement 3]

## Success Criteria

All of the following must be true:
- [ ] [Measurable criterion 1]
- [ ] [Measurable criterion 2]
- [ ] [Measurable criterion 3]

## Constraints

- [Technology constraints]
- [Style/convention constraints]
- [Performance constraints if any]

## Status

Track your progress here. Mark items complete as you go.
When all success criteria are met, print LOOP_COMPLETE.

Designing Hat Systems

When creating hats, follow these principles:

Each hat should have a single responsibility. Don't create a hat that plans AND builds.

Events flow forward. The event chain should be a clear pipeline: work.start → plan.ready → build.done → review (changes requested OR LOOP_COMPLETE).

Terminal hats should end, not publish success. For the final validation/review hat, success should be LOOP_COMPLETE (no success event like review.approved), and only rework/failure events should be published.

Instructions should be specific to the hat's role. The planner hat gets planning instructions, the builder gets building instructions.

Keep it minimal. 2-4 hats is typical. More than 5 is usually overengineered.

Common Hat Patterns

Plan → Build (2 hats): Good for tasks that need architectural thinking before coding.

Plan → Build → Review (3 hats): Good for tasks that need quality assurance.

Spec → Implement → Verify (3 hats): Good for spec-driven development.

Test → Implement → Verify (3 hats): Good for TDD workflows.

See references/hat-based-reference.md for full configuration examples.

Backpressure Configuration

Backpressure gates reject incomplete work. Common gates:

backpressure:
  gates:
    - name: "tests"
      command: "npm test"
      on_fail: "retry"
    - name: "lint"
      command: "npm run lint"
      on_fail: "retry"
    - name: "typecheck"
      command: "npx tsc --noEmit"
      on_fail: "retry"

Only add gates for tools that exist in the project. If there are no tests yet, don't add a test gate (unless the task IS to create tests).

No-Skip Safety Rules

When configuring backpressure and completion logic, preserve quality standards:

  • Never treat a circuit breaker as an automatic pass.
  • Never skip required checks that are configured in the repository.
  • Always require an explicit review outcome before completion (LOOP_COMPLETE or concrete changes requested).
  • If tests exist in the project and are part of quality gates, they must run and pass before completion.
  • If a gate is not configured in the repo, mark it not-configured explicitly rather than fabricating retries.

Loop Circuit Breaker and Escalation

To prevent infinite review/backpressure churn, include a circuit breaker policy in generated prompts/hats:

  • Detect repeated identical evidence cycles (same blocker class and materially identical build evidence) across 2-3 consecutive iterations.
  • If repetition threshold is reached, stop retrying the same recovery path.
  • Escalate instead of auto-completing:
    • record the blocker and evidence in .ralph/review.md
    • assign an owner and target finish date
    • set status to require human decision/clarification
  • Resume the loop only after the blocker criteria are clarified or configuration is corrected.

Operational Hygiene Between Runs

Treat runtime coordination state as loop-scoped:

  • Do not carry stale "recovery" tasks into a new objective unless explicitly intended.
  • Avoid creating new meta/recovery tasks when all implementation tasks are already closed and no new actionable finding exists.
  • Keep artifacts (.ralph/plan.md, .ralph/review.md, event logs) for auditability, but ensure open task queues reflect only current-loop actionable work.
  • Prefer one clear escalation handoff over repeated coordination retries with identical payloads.

Cost and Safety

Always configure iteration limits. Remind the user:

  • Default max iterations: 100
  • Default max runtime: 4 hours
  • A 50-iteration cycle on a large codebase can cost $50-100+ in API credits
  • Recommend starting with --max-iterations 30 for new setups and increasing if needed
  • Git checkpointing is on by default — the user can always roll back