Files
pm-skills/pm-ai-shipping/commands/derive-tests.md
T
Pawel Huryn 8202bdd7f1 Release v2.0.0: add pm-ai-shipping plugin, red-team execution skill, refresh README
New
- pm-ai-shipping (9th plugin) — AI Shipping Kit: document a vibe-coded app, audit
  security/performance against intended behavior, map test coverage, and compile a
  reviewer-ready shipping packet (2 skills, 5 commands).
- pm-execution: strategy-red-team skill + /red-team-prd command (now 16 skills, 11 commands).

Changed
- Bump all versions 1.0.1 -> 2.0.0 (marketplace.json + all 9 plugin.json) in lockstep.
- README: new plugins.png hero + examples.png in "How It Works"; counts updated to
  9 plugins / 68 skills / 42 commands across tagline, install block, and per-plugin sections.
- CLAUDE.md: 9-plugin structure, plugin table, and version note updated.

Validator: 9 plugins, 68 skills, 42 commands, 110 components, 0 warnings.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 18:49:54 +02:00

7.5 KiB

description, argument-hint
description argument-hint
Turn documented intent into a test-coverage map — inventory the tests that exist today, derive use-case cases from the system docs, separate existing coverage from proposed tests and unverified gaps, mark each unit / guarded-live / manual, and recommend a green-before-merge CI gate <repo path or area; defaults to the whole repository>

/derive-tests -- Turn Intent Into Tests

The docs say what the system should do. An audit finds where the code doesn't. Tests are what stop that gap from reopening after the next AI edit. This command reads the documented intent, turns each load-bearing rule into a concrete test case, sorts them into what to automate, what needs a guarded live run, and what stays manual — then recommends the CI gate that keeps main honest.

This produces a coverage map (tests.md) and concrete test cases, not a finished suite — you or the next agent implement the deterministic ones.

Invocation

/derive-tests
/derive-tests the checkout flow
/derive-tests supabase/functions

Prerequisite: documented intent

Tests are derived from the docs, so the docs come first. If /documentation/*.md is missing or thin, run /document-app (and /derive-tests reads flows.md, permissions.md, and automation.md most heavily). You cannot map coverage to rules you never wrote down — where intent is absent, say so rather than inventing rules to test.

The workflow

1. Read the intent — and the tests that already exist

Read the applicable system docs (architecture, flows, permissions, variables, and any of emails, cron, seo, automation that exist). Apply the shipping-artifacts skill for what each doc should contain, and the intended-vs-implemented skill for the discipline of treating docs as claims to verify, not proof.

Then inventory the existing test suite — the test files, what they actually assert, and what runs in CI today. The map you produce must distinguish coverage that exists now from coverage you're proposing; skipping this step yields a falsely-green map that claims rules are pinned when nothing checks them. If there are no tests, say so plainly — that is itself a finding.

2. Extract the rules worth testing

Pull out the load-bearing, deterministic rules — the ones whose violation crosses a trust, data, money, tenant, or privacy boundary:

  • authorization allow and deny cases (especially the boundary crossings in flows.md and the matrix in permissions.md),
  • input validation and output encoding at each sink,
  • idempotency of jobs and dedup keys,
  • fail-closed defaults (error / timeout / cache-miss / flag paths that must deny, not allow),
  • side-effect conditions (exactly when an email sends, a write commits, a paid action fires),
  • public-data-only constraints on public or bot routes,
  • the output-contract and tool-surface limits of any agent in automation.md.

Skip cosmetic behavior. A rule earns a test when getting it wrong harms someone other than the actor.

3. Build the coverage map

One row per use case: rule → expected behavior (incl. the negative case) → evidence source (doc + code) → test type → status (existing / proposed / none). The status column is what keeps the map honest — mark a rule existing only when a test in the repo actually asserts it today.

Test types:

  • unit — pure and deterministic, no external services.
  • integration (deterministic) — exercises real wiring against a local or in-memory dependency (test DB, mocked provider) and runs the same way every time.
  • guarded live — needs a real external DB, email provider, LLM, or third party. Runs only behind an explicit flag, never in the default CI run.
  • manual — UI/visual or judgment calls. A reviewer checklist item, not an automated test.

What CI must require: the deterministic local set — unit plus deterministic integration tests, the ones that pass or fail the same way on every run with no live dependencies. Prefer unit where the decision logic can be isolated; reach for integration when the rule lives in the wiring (middleware, RLS, auth guards) and only a real-but-local dependency can exercise it. Guarded-live and manual rows never gate the default run.

When a rule can only be exercised live, you can extract its decision into a pure helper so the logic is unit-testable — but only as a complement, not a replacement for testing the real enforcement. The unit test proves the helper's logic; it does not prove the framework actually calls it. Wiring and policy enforcement (route middleware, DB row-level security, auth guards, provider config) still needs an integration or guarded-live check, or the helper becomes a policy shadow that passes while the real path is unprotected.

4. Propose the tests

For each rule you can pin with a deterministic automated test (unit or integration), write the case: name, arrange/act/assert intent, and the negative case it must reject. Group cases by the doc or flow they defend. Prefer the smallest test that pins the rule — one clear assertion per boundary beats a sprawling integration test that fails for ten reasons.

5. Recommend the CI gate

Recommend — don't silently install — a CI setup matched to the repo's stack and existing tooling:

  • run the deterministic local set on every pull request (unit + any integration test that runs without live services),
  • keep guarded-live tests opt-in (manual or scheduled, never blocking),
  • gate merges to main on green via a required status check + branch protection.

Output the workflow file and the branch-protection setting as a clearly-labeled suggestion for the user to approve, not an applied change.

6. Report coverage and gaps

Write tests.md in three clearly separated sections:

  • Existing coverage — rules a test in the repo pins today (from the Step 1 inventory).
  • Proposed tests — the cases you're recommending but that don't exist yet, by type.
  • Gaps — documented rules with no verification at all, ranked by what crossing them exposes.

The gaps are the backlog, and they are exactly where the next AI edit can silently break a boundary. Be honest that proposed ≠ existing: a rule isn't covered until a test actually asserts it.

Output

Test Coverage: [scope]

| Use case | Rule (doc) | Expected behavior (+ deny case) | Evidence | Type | Status |
|----------|-----------|---------------------------------|----------|------|--------|
[status: existing / proposed / none]

### Existing coverage
[tests already in the repo, each tied to the rule it pins]

### Proposed tests
[grouped by flow/doc — name · assert · negative case · type]

### Recommended CI gate
[workflow snippet for the detected stack + "green-before-merge" branch-protection note]

### Gaps — documented but unverified
[rules with no test yet, ranked by what crossing them exposes]

Optionally write the coverage map to /documentation/tests.md and the full report to /reports/test_plan_{timestamp}.md.

Notes

  • This is the verification half of "documented == implemented": the audits find today's gap, these tests stop it from reopening tomorrow.
  • Don't fabricate rules to manufacture coverage. If the docs are silent, the gap is in the docs — fix /document-app first.
  • Don't wire external services into the default CI run; flaky live tests erode the green-before-merge gate until people start ignoring it.
  • Covers test derivation only. For the gap audit itself use /security-audit-static; for the full document → audit → test → packet sequence use /ship-check.