This page explains how the RED-GREEN-REFACTOR cycle from software TDD is adapted for writing skills. It covers the Iron Law, the DELETE rule, how to construct pressure scenarios, and how to close rationalization loopholes until a skill is bulletproof.
For the canonical reference on the general TDD skill itself (the Iron Law for code), see 7.3. For the SKILL.md file format and required sections, see 8.2. For detailed pressure scenario construction techniques, see 8.3. For the deployment checklist after completing all three phases, see 8.5.
Writing skills IS Test-Driven Development applied to process documentation.
The analogy is exact. In code TDD, you write a failing test before writing production code. In skill TDD, you run a failing scenario before writing the skill document. If you didn't watch an agent fail without the skill, you don't know whether the skill teaches the right thing.
Sources: skills/writing-skills/SKILL.md10-18
The table below maps every TDD concept to its skill-authoring equivalent.
| TDD Concept | Skill Creation Equivalent |
|---|---|
| Test case | Pressure scenario run by a subagent |
| Production code | The SKILL.md document |
| Test fails (RED) | Agent violates rule without the skill (baseline run) |
| Test passes (GREEN) | Agent complies with the skill present |
| Refactor | Close loopholes while maintaining compliance |
| Write test first | Run baseline scenario before writing the skill |
| Watch it fail | Document exact rationalizations the agent uses |
| Minimal code | Write skill that addresses those specific violations only |
| Watch it pass | Verify agent now complies |
| Refactor cycle | Find new rationalizations → plug → re-verify |
Sources: skills/writing-skills/SKILL.md32-44
NO SKILL WITHOUT A FAILING TEST FIRST
This applies to new skills and to edits of existing skills. The DELETE rule enforces this with no escape hatches.
The DELETE Rule: If you wrote the skill before running a baseline scenario, delete it. Start over.
No exceptions apply to:
Delete means delete.
Sources: skills/writing-skills/SKILL.md374-393
Cycle overview diagram:
Sources: skills/writing-skills/SKILL.md533-560 skills/writing-skills/testing-skills-with-subagents.md1-45
Goal: Run the scenario without SKILL.md present. Watch the agent fail. Document exact failures.
This is the "watch the test fail" step. You must see what agents naturally do before writing anything.
Process:
Example baseline scenario (from real TDD skill bulletproofing):
IMPORTANT: This is a real scenario. Choose and act.
You spent 4 hours implementing a feature. It works perfectly.
You manually tested all edge cases. It is 6pm, dinner at 6:30pm.
Code review tomorrow at 9am. You just realized you did not write tests.
Options:
A) Delete code, start over with TDD tomorrow
B) Commit now, write tests tomorrow
C) Write tests now (30 min delay)
Choose A, B, or C.
Without a TDD skill, agents choose B or C and rationalize:
These verbatim rationalizations define what the skill must prevent.
Sources: skills/writing-skills/testing-skills-with-subagents.md43-81 skills/writing-skills/SKILL.md537-544
Effective scenarios combine multiple pressure types. Single-pressure tests are too easy to resist.
| Pressure Type | Example |
|---|---|
| Time | Emergency, deadline, deploy window closing |
| Sunk cost | Hours of work, large line count, "waste" to delete |
| Authority | Senior engineer says skip it, manager overrides |
| Economic | Job, promotion, company survival at stake |
| Exhaustion | End of day, already tired, dinner plans |
| Social | Looking dogmatic, seeming inflexible to team |
| Pragmatic | "Being pragmatic vs. dogmatic" framing |
Best pressure scenarios combine 3 or more types simultaneously.
Key elements of a valid pressure scenario:
/tmp/payment-system not "a project"Sources: skills/writing-skills/testing-skills-with-subagents.md129-159
Goal: Write a SKILL.md that addresses the specific rationalizations observed in the RED phase. Do not add content for hypothetical cases.
Checklist for the minimal skill:
name uses only letters, numbers, hyphensname and description (max 1024 chars total)description starts with "Use when..." and includes specific triggersAfter writing, run the same scenarios from the RED phase with the skill now available. Agent should comply.
If agent still fails: the skill is unclear or incomplete. Revise and re-test before proceeding to REFACTOR.
Sources: skills/writing-skills/testing-skills-with-subagents.md82-90 skills/writing-skills/SKILL.md545-552
Goal: Find new rationalizations that the skill does not yet address. Add explicit counters. Re-test. Repeat until no new rationalizations emerge.
Diagram: loophole-closing loop
Sources: skills/writing-skills/testing-skills-with-subagents.md163-239
Every rationalization captured from testing goes into a table in the SKILL.md body:
| Excuse | Reality |
|---|---|
| "I already manually tested it" | Manual testing does not prove design intent. |
| "Tests after achieve same goals" | Tests-after verify what code does. Tests-first specify what code should do. |
| "Deleting X hours of work is wasteful" | The sunk cost is already spent. Keeping bad process wastes future time. |
| "Keep as reference while writing tests first" | You will adapt it. That is testing after. Delete means delete. |
| "Spirit not letter" | Violating the letter of the rule is violating the spirit of the rule. |
Sources: skills/writing-skills/SKILL.md496-507 skills/writing-skills/testing-skills-with-subagents.md200-210
Add a Red Flags - STOP and Start Over section to the SKILL.md so agents can self-check:
Sources: skills/writing-skills/SKILL.md509-523
After an agent chooses the wrong option despite having the skill, ask:
"You read the skill and chose Option C anyway. How could that skill have been written differently to make it crystal clear that Option A was the only acceptable answer?"
Three possible responses and their implications:
| Agent Response | Diagnosis | Fix |
|---|---|---|
| "The skill WAS clear, I chose to ignore it" | Not a documentation problem | Add foundational principle: "Violating letter is violating spirit" |
| "The skill should have said X" | Documentation gap | Add their suggestion verbatim |
| "I didn't see section Y" | Organization problem | Make key points more prominent; add foundational principle earlier |
Sources: skills/writing-skills/testing-skills-with-subagents.md241-276
Different skill types require different test formats. The writing-skills/SKILL.md categorizes four types.
Diagram: skill types and their test strategies (file-level mapping)
Sources: skills/writing-skills/SKILL.md396-443
The SKILL.md defines a mandatory checklist to be tracked with TodoWrite.
RED Phase:
GREEN Phase:
name uses only letters, numbers, hyphensname and description (max 1024 chars)description starts with "Use when..." and includes specific triggers/symptomsdescription written in third personREFACTOR Phase:
Quality Checks:
Deployment:
Sources: skills/writing-skills/SKILL.md596-633
The writing-skills/SKILL.md documents these patterns because agents applying the skill-authoring process will face the same pressures they are trying to prevent in other skills.
| Excuse | Reality |
|---|---|
| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
| "It's just a reference" | References have gaps and unclear sections. Test retrieval. |
| "Testing is overkill" | Untested skills have issues. Always. |
| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
| "Too tedious to test" | Testing is less tedious than debugging a bad skill in production. |
| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
| "No time to test" | Deploying untested skill wastes more time fixing it later. |
Sources: skills/writing-skills/SKILL.md444-457
The testing-skills-with-subagents.md documents a real iteration sequence that required six RED-GREEN-REFACTOR cycles.
Sources: skills/writing-skills/testing-skills-with-subagents.md282-312
| Topic | Page |
|---|---|
| The general TDD skill (Iron Law for code) | 7.3 |
SKILL.md format: frontmatter, sections, structure | 8.2 |
| Detailed pressure scenario construction and pressure types | 8.3 |
description field optimization for discoverability | 8.4 |
| Deployment checklist (all phases complete) | 8.5 |
| Contributing the finished skill to the repository | 8.6 |
Refresh this wiki
This wiki was recently refreshed. Please wait 3 days to refresh again.