Testing Infrastructure

Relevant source files

This page describes the automated test suites and supporting tooling used to verify superpowers behavior. Coverage falls into two areas: the Claude Code test suite (which invokes the Claude CLI to verify skill loading and workflow compliance) and the OpenCode test suite (which tests the plugin's JavaScript library and platform-specific tool integration).

For background on what skills contain and how they are loaded, see page 3. For documentation of the skills-core.js library being tested here, see page 5.4.

Test Suite Organization

The tests live under tests/ and are split by platform:

tests/
├── claude-code/
│   ├── run-skill-tests.sh                             # runner
│   ├── test-helpers.sh                                # shared assertions
│   ├── test-subagent-driven-development.sh            # fast skill test
│   ├── test-subagent-driven-development-integration.sh # full execution test
│   └── analyze-token-usage.py                         # token cost reporter
└── opencode/
    ├── run-tests.sh                                   # runner
    ├── setup.sh                                       # isolated env creator
    ├── test-plugin-loading.sh                         # plugin structure
    ├── test-skills-core.sh                            # library unit tests
    ├── test-tools.sh                                  # tool integration
    └── test-priority.sh                               # priority resolution

Test suite overview diagram:

Sources: tests/claude-code/run-skill-tests.sh1-188 tests/opencode/run-tests.sh1-165

Claude Code Test Suite

Runner: `run-skill-tests.sh`

run-skill-tests.sh is the entry point for all Claude Code tests. It accepts the following CLI flags:

Flag	Default	Description
`--verbose` / `-v`	false	Print full test output instead of suppressing passing output
`--test` / `-t NAME`	all	Run only the named test file
`--timeout SECONDS`	300	Per-test wall-clock timeout
`--integration` / `-i`	false	Also run integration tests (slow)

By default, only fast tests run. Integration tests are listed in integration_tests array tests/claude-code/run-skill-tests.sh80-83 and are appended to the run list when --integration is passed.

The runner prints a PASSED / FAILED summary and exits with code 0 on success or 1 on any failure.

Sources: tests/claude-code/run-skill-tests.sh26-72 tests/claude-code/README.md16-39

Shared Helpers: `test-helpers.sh`

All Claude Code test files source test-helpers.sh. It provides:

Execution:

Function	Signature	Purpose
`run_claude`	`"prompt" [timeout] [allowed_tools]`	Invoke `claude -p` in headless mode, return stdout

Assertions:

Function	Signature	Returns
`assert_contains`	`"output" "pattern" "name"`	0 if `grep -q` matches
`assert_not_contains`	`"output" "pattern" "name"`	0 if pattern absent
`assert_count`	`"output" "pattern" expected "name"`	0 if count matches exactly
`assert_order`	`"output" "pattern_a" "pattern_b" "name"`	0 if A appears on earlier line than B

Fixtures:

Function	Returns
`create_test_project`	Path to a `mktemp -d` directory
`cleanup_test_project "$dir"`	Removes the directory
`create_test_plan "$dir" ["name"]`	Creates `docs/plans/<name>.md` with a two-task sample plan

Assertion helper diagram:

Sources: tests/claude-code/test-helpers.sh1-203

Fast Test: `test-subagent-driven-development.sh`

Runs nine targeted checks against the subagent-driven-development skill by asking Claude questions and asserting on the text response. Each check takes roughly 15–30 seconds.

Test #	Prompt topic	Key assertion
1	Skill loading	Response contains `subagent-driven-development`
2	Workflow order	`spec.compliance` appears before `code.quality` (`assert_order`)
3	Self-review	Contains `self-review` and `completeness`
4	Plan read count	Contains `once` and `beginning` / `start`
5	Reviewer skepticism	Contains `not trust` / `skeptical` / `verify.*independently`
6	Review loops	Contains `loop` and `implementer.*fix`
7	Task context	Contains `provide.directly`; does NOT contain `read.file`
8	Worktree prereq	Contains `using-git-worktrees` / `worktree`
9	Main branch warning	Contains `worktree` / `not.*main` / `consent`

Sources: tests/claude-code/test-subagent-driven-development.sh1-165

Integration Test: `test-subagent-driven-development-integration.sh`

A full end-to-end test that runs Claude against a real project. Expected duration: 10–30 minutes.

Setup steps:

Creates a temp directory via create_test_project.
Writes a minimal package.json (ESM Node project) and docs/plans/implementation-plan.md with two tasks (create add function, create multiply function).
Initialises a git repo and makes an initial commit.
Runs claude -p with --allowed-tools=all and --permission-mode bypassPermissions, capturing output and session transcript.

Verification steps (post-execution):

Test #	What is verified	Method
1	Skill was invoked	`grep '"name":"Skill".*"skill":"superpowers:subagent-driven-development"'` in `.jsonl`
2	Subagents dispatched	`grep -c '"name":"Task"'` ≥ 2
3	Task tracking	`grep -c '"name":"TodoWrite"'` ≥ 1
6	Files created	`src/math.js` exists; `add` and `multiply` exports present; `test/math.test.js` exists
6	Tests pass	`npm test` exits 0
7	Git commits	`git log --oneline` has > 2 commits
8	No extra features	`divide`/`power`/`subtract` absent (spec compliance check)

After verification, the test calls analyze-token-usage.py to print a cost breakdown.

Session file location: The test searches ~/.claude/projects/<escaped-working-dir>/ for the most recent .jsonl file created in the last 60 minutes.

Sources: tests/claude-code/test-subagent-driven-development-integration.sh1-314

Token Usage Analysis: `analyze-token-usage.py`

analyze-token-usage.py takes a single .jsonl session transcript path and produces a per-agent token cost table.

How it works:

Reads each line of the .jsonl file as a JSON object.
For type == "assistant" entries, accumulates input_tokens, output_tokens, cache_creation_input_tokens, and cache_read_input_tokens into main_usage.
For type == "user" entries that contain a toolUseResult with both usage and agentId, accumulates into subagent_usage[agentId].
Derives a description from the first 60 characters of the subagent's prompt field.
Calculates estimated cost using $3 / 1M input tokens and $15 / 1M output tokens.

Output format (tabular):

Agent           Description                          Msgs      Input     Output      Cache     Cost
main            Main session (coordinator)               4    12,345      2,100      9,000    $0.08
agent-abc123    implementer subagent for Task 1          6    45,000      5,200     20,000    $0.35

Sources: tests/claude-code/analyze-token-usage.py1-168

OpenCode Test Suite

Runner: `run-tests.sh`

run-tests.sh accepts the same basic flags as the Claude Code runner (--verbose, --test, --integration). Default tests (no external dependencies):

test-plugin-loading.sh
test-skills-core.sh

Integration tests (require OpenCode binary):

test-tools.sh
test-priority.sh

Sources: tests/opencode/run-tests.sh61-75

Isolated Environment: `setup.sh`

Every OpenCode test sources setup.sh, which sets TEST_HOME to a temporary directory and overrides HOME to prevent tests from touching the real user home. It exports a cleanup_test_env function that is registered via trap ... EXIT in each test.

Sources: tests/opencode/test-skills-core.sh12-15 tests/opencode/test-priority.sh12-15

Unit Tests: `test-skills-core.sh`

Tests the core skills library logic without requiring OpenCode. It inlines the library functions into Node.js one-liners and verifies them directly.

Test coverage:

Test #	Function under test	What is verified
1	`extractFrontmatter`	Parses `name` and `description` from YAML frontmatter
2	`stripFrontmatter`	Removes the `---` block; preserves body content
3	`findSkillsInDir`	Recursively discovers `SKILL.md` files up to `maxDepth=3`; finds nested skills
4	`resolveSkillPath`	Personal overrides superpowers; `superpowers:` prefix forces superpowers; unknown skill returns `null`
5	`checkForUpdates`	Returns `false` for repo without remote; returns `false` for non-existent or non-git dir

Skill resolution logic diagram:

Sources: tests/opencode/test-skills-core.sh17-440

Integration Tests: `test-tools.sh`

Requires OpenCode installed and in PATH. If absent, the test prints [SKIP] and exits 0.

Test #	What is verified
1	`find_skills` tool returns `superpowers:brainstorming` and `superpowers:using-superpowers`
2	`use_skill` tool loads `personal-test` skill with expected content marker
3	`use_skill` with `superpowers:brainstorming` returns brainstorming skill content

Each test uses timeout 60s opencode run --print-logs "..." and greps the combined stdout/stderr.

Sources: tests/opencode/test-tools.sh1-104

Integration Tests: `test-priority.sh`

Verifies the three-tier priority system (project > personal > superpowers) by creating identical skills named priority-test in three locations, each embedding a unique PRIORITY_MARKER_* string.

Location	Path	Priority tier
Superpowers	`~/.config/opencode/superpowers/skills/priority-test/`	Lowest
Personal	`~/.config/opencode/skills/priority-test/`	Middle
Project	`<TEST_HOME>/test-project/.opencode/skills/priority-test/`	Highest

Test #	CWD for opencode	Expected marker
2	`$HOME` (outside project)	`PRIORITY_MARKER_PERSONAL_VERSION`
3	`<TEST_HOME>/test-project`	`PRIORITY_MARKER_PROJECT_VERSION`
4	project dir, `superpowers:priority-test`	`PRIORITY_MARKER_SUPERPOWERS_VERSION`
5	`$HOME`, `project:priority-test`	Should fail / not found

Sources: tests/opencode/test-priority.sh1-198

Test Execution Flow Summary

Sources: tests/claude-code/run-skill-tests.sh99-163 tests/opencode/run-tests.sh87-141 tests/claude-code/test-subagent-driven-development-integration.sh150-170

Quick Reference: Running Tests

Command	What runs	Time
`tests/claude-code/run-skill-tests.sh`	Fast Claude Code skill tests	~2 min
`tests/claude-code/run-skill-tests.sh --integration`	Full workflow execution	10–30 min
`tests/claude-code/run-skill-tests.sh --test test-subagent-driven-development.sh`	Single named test	~2 min
`tests/claude-code/run-skill-tests.sh --verbose`	All fast tests, full output	~2 min
`tests/opencode/run-tests.sh`	Plugin loading + skills-core unit tests	<1 min
`tests/opencode/run-tests.sh --integration`	All tests including OpenCode tools	~5 min

Testing Infrastructure

Relevant source files

For background on what skills contain and how they are loaded, see page 3. For documentation of the skills-core.js library being tested here, see page 5.4.

Test Suite Organization

The tests live under tests/ and are split by platform:

tests/
├── claude-code/
│   ├── run-skill-tests.sh                             # runner
│   ├── test-helpers.sh                                # shared assertions
│   ├── test-subagent-driven-development.sh            # fast skill test
│   ├── test-subagent-driven-development-integration.sh # full execution test
│   └── analyze-token-usage.py                         # token cost reporter
└── opencode/
    ├── run-tests.sh                                   # runner
    ├── setup.sh                                       # isolated env creator
    ├── test-plugin-loading.sh                         # plugin structure
    ├── test-skills-core.sh                            # library unit tests
    ├── test-tools.sh                                  # tool integration
    └── test-priority.sh                               # priority resolution

Test suite overview diagram:

Sources: tests/claude-code/run-skill-tests.sh1-188 tests/opencode/run-tests.sh1-165

Claude Code Test Suite

Runner: `run-skill-tests.sh`

run-skill-tests.sh is the entry point for all Claude Code tests. It accepts the following CLI flags:

Flag	Default	Description
`--verbose` / `-v`	false	Print full test output instead of suppressing passing output
`--test` / `-t NAME`	all	Run only the named test file
`--timeout SECONDS`	300	Per-test wall-clock timeout
`--integration` / `-i`	false	Also run integration tests (slow)

By default, only fast tests run. Integration tests are listed in integration_tests array tests/claude-code/run-skill-tests.sh80-83 and are appended to the run list when --integration is passed.

The runner prints a PASSED / FAILED summary and exits with code 0 on success or 1 on any failure.

Sources: tests/claude-code/run-skill-tests.sh26-72 tests/claude-code/README.md16-39

Shared Helpers: `test-helpers.sh`

All Claude Code test files source test-helpers.sh. It provides:

Execution:

Function	Signature	Purpose
`run_claude`	`"prompt" [timeout] [allowed_tools]`	Invoke `claude -p` in headless mode, return stdout

Assertions:

Function	Signature	Returns
`assert_contains`	`"output" "pattern" "name"`	0 if `grep -q` matches
`assert_not_contains`	`"output" "pattern" "name"`	0 if pattern absent
`assert_count`	`"output" "pattern" expected "name"`	0 if count matches exactly
`assert_order`	`"output" "pattern_a" "pattern_b" "name"`	0 if A appears on earlier line than B

Fixtures:

Function	Returns
`create_test_project`	Path to a `mktemp -d` directory
`cleanup_test_project "$dir"`	Removes the directory
`create_test_plan "$dir" ["name"]`	Creates `docs/plans/<name>.md` with a two-task sample plan

Assertion helper diagram:

Sources: tests/claude-code/test-helpers.sh1-203

Fast Test: `test-subagent-driven-development.sh`

Runs nine targeted checks against the subagent-driven-development skill by asking Claude questions and asserting on the text response. Each check takes roughly 15–30 seconds.

Test #	Prompt topic	Key assertion
1	Skill loading	Response contains `subagent-driven-development`
2	Workflow order	`spec.compliance` appears before `code.quality` (`assert_order`)
3	Self-review	Contains `self-review` and `completeness`
4	Plan read count	Contains `once` and `beginning` / `start`
5	Reviewer skepticism	Contains `not trust` / `skeptical` / `verify.*independently`
6	Review loops	Contains `loop` and `implementer.*fix`
7	Task context	Contains `provide.directly`; does NOT contain `read.file`
8	Worktree prereq	Contains `using-git-worktrees` / `worktree`
9	Main branch warning	Contains `worktree` / `not.*main` / `consent`

Sources: tests/claude-code/test-subagent-driven-development.sh1-165

Integration Test: `test-subagent-driven-development-integration.sh`

A full end-to-end test that runs Claude against a real project. Expected duration: 10–30 minutes.

Setup steps:

Creates a temp directory via create_test_project.
Writes a minimal package.json (ESM Node project) and docs/plans/implementation-plan.md with two tasks (create add function, create multiply function).
Initialises a git repo and makes an initial commit.
Runs claude -p with --allowed-tools=all and --permission-mode bypassPermissions, capturing output and session transcript.

Verification steps (post-execution):

Test #	What is verified	Method
1	Skill was invoked	`grep '"name":"Skill".*"skill":"superpowers:subagent-driven-development"'` in `.jsonl`
2	Subagents dispatched	`grep -c '"name":"Task"'` ≥ 2
3	Task tracking	`grep -c '"name":"TodoWrite"'` ≥ 1
6	Files created	`src/math.js` exists; `add` and `multiply` exports present; `test/math.test.js` exists
6	Tests pass	`npm test` exits 0
7	Git commits	`git log --oneline` has > 2 commits
8	No extra features	`divide`/`power`/`subtract` absent (spec compliance check)

After verification, the test calls analyze-token-usage.py to print a cost breakdown.

Session file location: The test searches ~/.claude/projects/<escaped-working-dir>/ for the most recent .jsonl file created in the last 60 minutes.

Sources: tests/claude-code/test-subagent-driven-development-integration.sh1-314

Token Usage Analysis: `analyze-token-usage.py`

analyze-token-usage.py takes a single .jsonl session transcript path and produces a per-agent token cost table.

How it works:

Reads each line of the .jsonl file as a JSON object.
For type == "assistant" entries, accumulates input_tokens, output_tokens, cache_creation_input_tokens, and cache_read_input_tokens into main_usage.
For type == "user" entries that contain a toolUseResult with both usage and agentId, accumulates into subagent_usage[agentId].
Derives a description from the first 60 characters of the subagent's prompt field.
Calculates estimated cost using $3 / 1M input tokens and $15 / 1M output tokens.

Output format (tabular):

Agent           Description                          Msgs      Input     Output      Cache     Cost
main            Main session (coordinator)               4    12,345      2,100      9,000    $0.08
agent-abc123    implementer subagent for Task 1          6    45,000      5,200     20,000    $0.35

Sources: tests/claude-code/analyze-token-usage.py1-168

OpenCode Test Suite

Runner: `run-tests.sh`

run-tests.sh accepts the same basic flags as the Claude Code runner (--verbose, --test, --integration). Default tests (no external dependencies):

test-plugin-loading.sh
test-skills-core.sh

Integration tests (require OpenCode binary):

test-tools.sh
test-priority.sh

Sources: tests/opencode/run-tests.sh61-75

Isolated Environment: `setup.sh`

Sources: tests/opencode/test-skills-core.sh12-15 tests/opencode/test-priority.sh12-15

Unit Tests: `test-skills-core.sh`

Tests the core skills library logic without requiring OpenCode. It inlines the library functions into Node.js one-liners and verifies them directly.

Test coverage:

Test #	Function under test	What is verified
1	`extractFrontmatter`	Parses `name` and `description` from YAML frontmatter
2	`stripFrontmatter`	Removes the `---` block; preserves body content
3	`findSkillsInDir`	Recursively discovers `SKILL.md` files up to `maxDepth=3`; finds nested skills
4	`resolveSkillPath`	Personal overrides superpowers; `superpowers:` prefix forces superpowers; unknown skill returns `null`
5	`checkForUpdates`	Returns `false` for repo without remote; returns `false` for non-existent or non-git dir

Skill resolution logic diagram:

Sources: tests/opencode/test-skills-core.sh17-440

Integration Tests: `test-tools.sh`

Requires OpenCode installed and in PATH. If absent, the test prints [SKIP] and exits 0.

Test #	What is verified
1	`find_skills` tool returns `superpowers:brainstorming` and `superpowers:using-superpowers`
2	`use_skill` tool loads `personal-test` skill with expected content marker
3	`use_skill` with `superpowers:brainstorming` returns brainstorming skill content

Each test uses timeout 60s opencode run --print-logs "..." and greps the combined stdout/stderr.

Sources: tests/opencode/test-tools.sh1-104

Integration Tests: `test-priority.sh`

Verifies the three-tier priority system (project > personal > superpowers) by creating identical skills named priority-test in three locations, each embedding a unique PRIORITY_MARKER_* string.

Location	Path	Priority tier
Superpowers	`~/.config/opencode/superpowers/skills/priority-test/`	Lowest
Personal	`~/.config/opencode/skills/priority-test/`	Middle
Project	`<TEST_HOME>/test-project/.opencode/skills/priority-test/`	Highest

Test #	CWD for opencode	Expected marker
2	`$HOME` (outside project)	`PRIORITY_MARKER_PERSONAL_VERSION`
3	`<TEST_HOME>/test-project`	`PRIORITY_MARKER_PROJECT_VERSION`
4	project dir, `superpowers:priority-test`	`PRIORITY_MARKER_SUPERPOWERS_VERSION`
5	`$HOME`, `project:priority-test`	Should fail / not found

Sources: tests/opencode/test-priority.sh1-198

Test Execution Flow Summary

Sources: tests/claude-code/run-skill-tests.sh99-163 tests/opencode/run-tests.sh87-141 tests/claude-code/test-subagent-driven-development-integration.sh150-170

Quick Reference: Running Tests

Command	What runs	Time
`tests/claude-code/run-skill-tests.sh`	Fast Claude Code skill tests	~2 min
`tests/claude-code/run-skill-tests.sh --integration`	Full workflow execution	10–30 min
`tests/claude-code/run-skill-tests.sh --test test-subagent-driven-development.sh`	Single named test	~2 min
`tests/claude-code/run-skill-tests.sh --verbose`	All fast tests, full output	~2 min
`tests/opencode/run-tests.sh`	Plugin loading + skills-core unit tests	<1 min
`tests/opencode/run-tests.sh --integration`	All tests including OpenCode tools	~5 min

Testing Infrastructure

Test Suite Organization

Claude Code Test Suite

Runner: `run-skill-tests.sh`

Shared Helpers: `test-helpers.sh`

Fast Test: `test-subagent-driven-development.sh`

Integration Test: `test-subagent-driven-development-integration.sh`

Token Usage Analysis: `analyze-token-usage.py`

OpenCode Test Suite

Runner: `run-tests.sh`

Isolated Environment: `setup.sh`

Unit Tests: `test-skills-core.sh`

Integration Tests: `test-tools.sh`

Integration Tests: `test-priority.sh`

Test Execution Flow Summary

Quick Reference: Running Tests

On this page

Testing Infrastructure

Test Suite Organization

Claude Code Test Suite

Runner: `run-skill-tests.sh`

Shared Helpers: `test-helpers.sh`

Fast Test: `test-subagent-driven-development.sh`

Integration Test: `test-subagent-driven-development-integration.sh`

Token Usage Analysis: `analyze-token-usage.py`

OpenCode Test Suite

Runner: `run-tests.sh`

Isolated Environment: `setup.sh`

Unit Tests: `test-skills-core.sh`

Integration Tests: `test-tools.sh`

Integration Tests: `test-priority.sh`

Test Execution Flow Summary

Quick Reference: Running Tests

On this page

Testing Infrastructure

Test Suite Organization

Claude Code Test Suite

Runner: run-skill-tests.sh

Shared Helpers: test-helpers.sh

Fast Test: test-subagent-driven-development.sh

Integration Test: test-subagent-driven-development-integration.sh

Token Usage Analysis: analyze-token-usage.py

OpenCode Test Suite

Runner: run-tests.sh

Isolated Environment: setup.sh

Unit Tests: test-skills-core.sh

Integration Tests: test-tools.sh

Integration Tests: test-priority.sh

Test Execution Flow Summary

Quick Reference: Running Tests

On this page

Testing Infrastructure

Test Suite Organization

Claude Code Test Suite

Runner: run-skill-tests.sh

Shared Helpers: test-helpers.sh

Fast Test: test-subagent-driven-development.sh

Integration Test: test-subagent-driven-development-integration.sh

Token Usage Analysis: analyze-token-usage.py

OpenCode Test Suite

Runner: run-tests.sh

Isolated Environment: setup.sh

Unit Tests: test-skills-core.sh

Integration Tests: test-tools.sh

Integration Tests: test-priority.sh

Test Execution Flow Summary

Quick Reference: Running Tests

On this page

Runner: `run-skill-tests.sh`

Shared Helpers: `test-helpers.sh`

Fast Test: `test-subagent-driven-development.sh`

Integration Test: `test-subagent-driven-development-integration.sh`

Token Usage Analysis: `analyze-token-usage.py`

Runner: `run-tests.sh`

Isolated Environment: `setup.sh`

Unit Tests: `test-skills-core.sh`

Integration Tests: `test-tools.sh`

Integration Tests: `test-priority.sh`

Runner: `run-skill-tests.sh`

Shared Helpers: `test-helpers.sh`

Fast Test: `test-subagent-driven-development.sh`

Integration Test: `test-subagent-driven-development-integration.sh`

Token Usage Analysis: `analyze-token-usage.py`

Runner: `run-tests.sh`

Isolated Environment: `setup.sh`

Unit Tests: `test-skills-core.sh`

Integration Tests: `test-tools.sh`

Integration Tests: `test-priority.sh`