admin/portfolio

Fork 0

Files

T

admin 4dfb1607c1 Updated chart

2026-02-16 13:23:04 +00:00

11 KiB

Raw Blame History

name, description

name	description
ralph-setup	Set up autonomous AI development tasks using the Ralph Wiggum technique. Use when the user wants to create a RALPH orchestration — either a simple looping prompt or a multi-hat coordinated workflow. Interviews the user to understand requirements, decides the appropriate mode, and generates all necessary configuration files (ralph.yml, hats.yml, PROMPT.md). Triggers on mentions of "ralph", "autonomous loop", "hat-based", "orchestration", or requests to set up iterative AI agent tasks.

name

description

ralph-setup

Set up autonomous AI development tasks using the Ralph Wiggum technique. Use when the user wants to create a RALPH orchestration — either a simple looping prompt or a multi-hat coordinated workflow. Interviews the user to understand requirements, decides the appropriate mode, and generates all necessary configuration files (ralph.yml, hats.yml, PROMPT.md). Triggers on mentions of "ralph", "autonomous loop", "hat-based", "orchestration", or requests to set up iterative AI agent tasks.

Ralph Setup Skill

Set up autonomous AI development tasks using the Ralph Wiggum technique — either as a simple iterating prompt or a coordinated hat-based workflow.

Background

Ralph implements the Ralph Wiggum technique: give an AI agent a task, loop it until it's done. The orchestrator is deliberately thin — it trusts the agent to do the work and enforces quality through backpressure (tests, lint, typecheck must pass).

There are two modes:

Mode	What It Does	Best For
Traditional (Simple Prompt)	Single loop — agent iterates until LOOP_COMPLETE	Quick tasks, single-concern work, anything one agent can handle in a straight line
Hat-Based	Specialised personas coordinate through typed events	Complex workflows, multi-step processes, tasks needing distinct planning/building/reviewing phases

Core Tenets (Apply to Both Modes)

These six tenets guide every RALPH setup. Reference them when making decisions:

Fresh Context Is Reliability — Each iteration clears context. The prompt must be self-contained enough to re-read, re-plan, and re-execute every cycle.
Backpressure Over Prescription — Don't prescribe HOW to do the work. Create gates that reject bad work (tests pass, lint clean, types check).
The Plan Is Disposable — Regeneration costs one planning loop. Cheap. Don't over-invest in preserving plans.
Disk Is State, Git Is Memory — Files are the handoff mechanism between iterations. Git provides checkpointing and rollback.
Steer With Signals, Not Scripts — Add signs (success criteria, quality gates), not step-by-step scripts.
Let Ralph Ralph — Sit ON the loop, not IN it. The orchestrator coordinates; the agent does the work.

Workflow

Phase 1: Interview the User

Before generating anything, you need to understand the task. Ask targeted questions to fill in these blanks:

Essential information:

What is the task? (Be specific — "build an API" is too vague; "build a REST API for user management with Express.js and TypeScript" is good)
What does "done" look like? (Measurable success criteria — tests pass, endpoints respond, specific files exist)
What language/framework/tools are involved?
Does the project already exist, or is this greenfield?
Are there existing tests, linting, or type-checking set up?

Information that helps you decide the mode:

How many distinct phases or concerns does this task have? (1-2 = simple prompt; 3+ = consider hats)
Does the task need planning before building? (If yes, hat-based is likely better)
Does the task need a review/QA step separate from building? (If yes, hat-based)
Is there a spec or design document to follow? (Spec-driven development suits hats well)
How complex is the codebase? (Large existing codebase with multiple modules = hat-based)

Don't over-interview. If the user gives you a clear, well-scoped task, you may have enough after 1-2 questions. If the task is vague, probe until you can write a crisp PROMPT.md.

Phase 2: Decide the Mode

Use this decision framework:

Choose Simple Prompt when:

The task is a single concern (add a feature, fix a bug, write a script)
One agent can handle it start to finish without distinct phases
The success criteria are straightforward (tests pass, script runs)
The user explicitly wants something quick and simple
The task can be fully described in a PROMPT.md under ~50 lines

Choose Hat-Based when:

The task has 3+ distinct phases (plan → build → test → review)
Different phases need different "mindsets" (architect vs implementer vs reviewer)
The task involves spec-driven development (spec → implement → verify)
There's a TDD workflow (write tests → implement → verify)
The task is large enough that a single prompt would be overwhelming
Multiple files/modules need coordinated changes
The user explicitly asks for hats or a structured workflow

When in doubt: Start with Simple Prompt. You can always add hats later. Simpler is more robust.

Phase 3: Generate the Files

Generate the appropriate files into the user's project directory. Always explain what you're creating and why.

Read the appropriate reference file before generating:

For Simple Prompt: references/simple-prompt-reference.md
For Hat-Based: references/hat-based-reference.md

Files to Generate

Both modes:

ralph.yml — Main configuration
PROMPT.md — The task definition

Hat-Based mode additionally:

hats.yml — Hat definitions with triggers, publishes, and instructions

Phase 4: Review with the User

After generating the files, walk the user through what you created:

Summarise the task as you understood it
Explain the mode choice and why
Highlight the success criteria / completion promise
For hat-based: explain the event flow between hats
Ask if anything needs adjusting before they run it

Then tell them how to run it:

# Simple prompt
ralph run

# Hat-based
ralph run --config hats.yml

# With iteration limit
ralph run --max-iterations 50

Writing Good Prompts (PROMPT.md)

The PROMPT.md is the most important file. It must be:

Self-contained: Every iteration starts fresh. The prompt must contain everything the agent needs to understand the task, check progress, and continue.

Outcome-focused: Define WHAT, not HOW. Let the agent figure out the approach.

Measurable: Include concrete success criteria the agent can verify:

"All tests pass" (not "write good tests")
"The /users endpoint returns 200 with valid JSON" (not "make the API work")
"TypeScript compiles with zero errors" (not "fix the types")

Structured but not prescriptive: Use sections like Task, Requirements, Success Criteria, Constraints. Don't write step-by-step instructions.

Prompt Template (Simple)

# Task: [Clear, specific title]

[2-3 sentence description of what needs to be built/done]

## Requirements

- [Specific requirement 1]
- [Specific requirement 2]
- [Specific requirement 3]

## Success Criteria

All of the following must be true:
- [ ] [Measurable criterion 1]
- [ ] [Measurable criterion 2]
- [ ] [Measurable criterion 3]

## Constraints

- [Technology constraints]
- [Style/convention constraints]
- [Performance constraints if any]

## Status

Track your progress here. Mark items complete as you go.
When all success criteria are met, print LOOP_COMPLETE.

Designing Hat Systems

When creating hats, follow these principles:

Each hat should have a single responsibility. Don't create a hat that plans AND builds.

Events flow forward. The event chain should be a clear pipeline: work.start → plan.ready → build.done → review (changes requested OR LOOP_COMPLETE).

Terminal hats should end, not publish success. For the final validation/review hat, success should be LOOP_COMPLETE (no success event like review.approved), and only rework/failure events should be published.

Instructions should be specific to the hat's role. The planner hat gets planning instructions, the builder gets building instructions.

Keep it minimal. 2-4 hats is typical. More than 5 is usually overengineered.

Common Hat Patterns

Plan → Build (2 hats): Good for tasks that need architectural thinking before coding.

Plan → Build → Review (3 hats): Good for tasks that need quality assurance.

Spec → Implement → Verify (3 hats): Good for spec-driven development.

Test → Implement → Verify (3 hats): Good for TDD workflows.

See references/hat-based-reference.md for full configuration examples.

Backpressure Configuration

Backpressure gates reject incomplete work. Common gates:

backpressure:
  gates:
    - name: "tests"
      command: "npm test"
      on_fail: "retry"
    - name: "lint"
      command: "npm run lint"
      on_fail: "retry"
    - name: "typecheck"
      command: "npx tsc --noEmit"
      on_fail: "retry"

Only add gates for tools that exist in the project. If there are no tests yet, don't add a test gate (unless the task IS to create tests).

No-Skip Safety Rules

When configuring backpressure and completion logic, preserve quality standards:

Never treat a circuit breaker as an automatic pass.
Never skip required checks that are configured in the repository.
Always require an explicit review outcome before completion (LOOP_COMPLETE or concrete changes requested).
If tests exist in the project and are part of quality gates, they must run and pass before completion.
If a gate is not configured in the repo, mark it not-configured explicitly rather than fabricating retries.

Loop Circuit Breaker and Escalation

To prevent infinite review/backpressure churn, include a circuit breaker policy in generated prompts/hats:

Detect repeated identical evidence cycles (same blocker class and materially identical build evidence) across 2-3 consecutive iterations.
If repetition threshold is reached, stop retrying the same recovery path.
Escalate instead of auto-completing:
- record the blocker and evidence in .ralph/review.md
- assign an owner and target finish date
- set status to require human decision/clarification
Resume the loop only after the blocker criteria are clarified or configuration is corrected.

Operational Hygiene Between Runs

Treat runtime coordination state as loop-scoped:

Do not carry stale "recovery" tasks into a new objective unless explicitly intended.
Avoid creating new meta/recovery tasks when all implementation tasks are already closed and no new actionable finding exists.
Keep artifacts (.ralph/plan.md, .ralph/review.md, event logs) for auditability, but ensure open task queues reflect only current-loop actionable work.
Prefer one clear escalation handoff over repeated coordination retries with identical payloads.

Cost and Safety

Always configure iteration limits. Remind the user:

Default max iterations: 100
Default max runtime: 4 hours
A 50-iteration cycle on a large codebase can cost $50-100+ in API credits
Recommend starting with --max-iterations 30 for new setups and increasing if needed
Git checkpointing is on by default — the user can always roll back

11 KiB Raw Blame History