refactor: documentation for workflows: update Planning & Building, Reviewing & Shipping, and Testing & Debugging sections to enhance clarity and structure.

This commit is contained in:
duthaho
2026-05-07 16:57:35 +07:00
parent 44a3a2835d
commit 52e2cd6b4b
147 changed files with 4269 additions and 20215 deletions
+3 -3
View File
@@ -9,7 +9,7 @@ export default defineConfig({
integrations: [
starlight({
title: 'Claude Kit',
description: 'The development-workflow plugin for Claude Code. 35 skills organized around a 6-phase workflow (Think → Review → Build → Ship → Maintain → Setup), 24 agents, 7 modes. Free forever.',
description: 'A verification-first engineering toolkit for Claude Code. 15 skills across a 5-phase spine (Investigate → Design → Implement → Verify → Ship), 8 specialist agents, 5 output styles. Free forever.',
social: [
{ icon: 'github', label: 'GitHub', href: 'https://github.com/duthaho/claudekit' }
],
@@ -60,7 +60,7 @@ export default defineConfig({
items: [
{ label: 'Skills', slug: 'reference/skills' },
{ label: 'Agents', slug: 'reference/agents' },
{ label: 'Modes', slug: 'reference/modes' },
{ label: 'Output Styles', slug: 'reference/output-styles' },
{ label: 'MCP Servers', slug: 'reference/mcp-servers' },
],
},
@@ -68,7 +68,7 @@ export default defineConfig({
label: 'Customization',
items: [
{ label: 'Creating Skills', slug: 'customization/creating-skills' },
{ label: 'Creating Agents & Modes', slug: 'customization/creating-agents-and-modes' },
{ label: 'Creating Agents & Output Styles', slug: 'customization/creating-agents-and-modes' },
],
},
],
+31 -46
View File
@@ -1,63 +1,48 @@
<svg width="400" height="300" viewBox="0 0 400 300" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="The six-phase Claude Kit workflow: Think, Review, Build, Ship, Maintain, Setup">
<svg width="400" height="300" viewBox="0 0 400 300" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="The five-phase Claude Kit workflow: Investigate, Design, Implement, Verify, Ship">
<defs>
<radialGradient id="glow-d" cx="50%" cy="45%" r="55%">
<radialGradient id="glow-d" cx="50%" cy="50%" r="55%">
<stop offset="0%" stop-color="#fbbf24" stop-opacity="0.16"/>
<stop offset="100%" stop-color="#fbbf24" stop-opacity="0"/>
</radialGradient>
</defs>
<!-- Ambient wash -->
<ellipse cx="200" cy="140" rx="200" ry="130" fill="url(#glow-d)"/>
<ellipse cx="200" cy="150" rx="200" ry="120" fill="url(#glow-d)"/>
<!-- Phase labels (row 1) -->
<g font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="11" font-weight="500" letter-spacing="0.08em" fill="#a8a29e" text-anchor="middle">
<text x="70" y="80">THINK</text>
<text x="200" y="80">REVIEW</text>
<text x="330" y="80">BUILD</text>
</g>
<!-- Phase numbers (row 1) -->
<!-- Phase numbers (above labels) -->
<g font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="9" font-weight="500" fill="#57534e" text-anchor="middle">
<text x="70" y="64">01</text>
<text x="200" y="64">02</text>
<text x="330" y="64">03</text>
<text x="50" y="100">01</text>
<text x="125" y="100">02</text>
<text x="200" y="100">03</text>
<text x="275" y="100">04</text>
<text x="350" y="100">05</text>
</g>
<!-- Row 1 connector line -->
<line x1="82" y1="110" x2="188" y2="110" stroke="#fbbf24" stroke-width="1.5" stroke-dasharray="2 3"/>
<line x1="212" y1="110" x2="318" y2="110" stroke="#fbbf24" stroke-width="1.5" stroke-dasharray="2 3"/>
<!-- Row 1 nodes -->
<circle cx="70" cy="110" r="7" fill="#0a0f0a" stroke="#fbbf24" stroke-width="2"/>
<circle cx="200" cy="110" r="7" fill="#0a0f0a" stroke="#fbbf24" stroke-width="2"/>
<circle cx="330" cy="110" r="7" fill="#fbbf24"/>
<!-- Elbow: Build (3) -> Ship (4) -->
<path d="M 337 110 Q 360 110 360 150 Q 360 190 337 190" fill="none" stroke="#fbbf24" stroke-width="1.5" stroke-dasharray="2 3"/>
<!-- Row 2 connector line -->
<line x1="318" y1="190" x2="212" y2="190" stroke="#fbbf24" stroke-width="1.5" stroke-dasharray="2 3"/>
<line x1="188" y1="190" x2="82" y2="190" stroke="#fbbf24" stroke-width="1.5" stroke-dasharray="2 3"/>
<!-- Row 2 nodes -->
<circle cx="330" cy="190" r="7" fill="#0a0f0a" stroke="#fbbf24" stroke-width="2"/>
<circle cx="200" cy="190" r="7" fill="#0a0f0a" stroke="#fbbf24" stroke-width="2"/>
<circle cx="70" cy="190" r="7" fill="#0a0f0a" stroke="#fbbf24" stroke-width="2"/>
<!-- Phase labels (row 2) -->
<!-- Phase labels (above nodes) -->
<g font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="11" font-weight="500" letter-spacing="0.08em" fill="#a8a29e" text-anchor="middle">
<text x="330" y="222">SHIP</text>
<text x="200" y="222">MAINTAIN</text>
<text x="70" y="222">SETUP</text>
<text x="50" y="120">INVESTIGATE</text>
<text x="125" y="120">DESIGN</text>
<text x="200" y="120">IMPLEMENT</text>
<text x="275" y="120">VERIFY</text>
<text x="350" y="120">SHIP</text>
</g>
<!-- Phase numbers (row 2) -->
<g font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="9" font-weight="500" fill="#57534e" text-anchor="middle">
<text x="330" y="238">04</text>
<text x="200" y="238">05</text>
<text x="70" y="238">06</text>
</g>
<!-- Connector lines between nodes (dashed) -->
<line x1="60" y1="150" x2="115" y2="150" stroke="#fbbf24" stroke-width="1.5" stroke-dasharray="2 3"/>
<line x1="135" y1="150" x2="190" y2="150" stroke="#fbbf24" stroke-width="1.5" stroke-dasharray="2 3"/>
<line x1="210" y1="150" x2="265" y2="150" stroke="#fbbf24" stroke-width="1.5" stroke-dasharray="2 3"/>
<line x1="285" y1="150" x2="340" y2="150" stroke="#fbbf24" stroke-width="1.5" stroke-dasharray="2 3"/>
<!-- Nodes (Verify is filled — load-bearing phase for verification-first identity) -->
<circle cx="50" cy="150" r="7" fill="#0a0f0a" stroke="#fbbf24" stroke-width="2"/>
<circle cx="125" cy="150" r="7" fill="#0a0f0a" stroke="#fbbf24" stroke-width="2"/>
<circle cx="200" cy="150" r="7" fill="#0a0f0a" stroke="#fbbf24" stroke-width="2"/>
<circle cx="275" cy="150" r="7" fill="#fbbf24"/>
<circle cx="350" cy="150" r="7" fill="#0a0f0a" stroke="#fbbf24" stroke-width="2"/>
<!-- Sub-tagline under spine -->
<text x="200" y="195" font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="10" font-weight="400" letter-spacing="0.08em" fill="#a8a29e" text-anchor="middle">VERIFICATION-FIRST ENGINEERING TOOLKIT</text>
<!-- Footer inscription -->
<text x="200" y="270" font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="10" font-weight="400" letter-spacing="0.12em" fill="#57534e" text-anchor="middle">35 SKILLS · 24 AGENTS · 7 MODES</text>
<text x="200" y="245" font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="10" font-weight="400" letter-spacing="0.12em" fill="#57534e" text-anchor="middle">15 SKILLS · 8 AGENTS · 5 OUTPUT STYLES</text>
</svg>

Before

Width:  |  Height:  |  Size: 3.1 KiB

After

Width:  |  Height:  |  Size: 2.7 KiB

+31 -46
View File
@@ -1,63 +1,48 @@
<svg width="400" height="300" viewBox="0 0 400 300" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="The six-phase Claude Kit workflow: Think, Review, Build, Ship, Maintain, Setup">
<svg width="400" height="300" viewBox="0 0 400 300" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="The five-phase Claude Kit workflow: Investigate, Design, Implement, Verify, Ship">
<defs>
<radialGradient id="glow-l" cx="50%" cy="45%" r="55%">
<radialGradient id="glow-l" cx="50%" cy="50%" r="55%">
<stop offset="0%" stop-color="#d97706" stop-opacity="0.10"/>
<stop offset="100%" stop-color="#d97706" stop-opacity="0"/>
</radialGradient>
</defs>
<!-- Ambient wash -->
<ellipse cx="200" cy="140" rx="200" ry="130" fill="url(#glow-l)"/>
<ellipse cx="200" cy="150" rx="200" ry="120" fill="url(#glow-l)"/>
<!-- Phase labels (row 1, above nodes) -->
<g font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="11" font-weight="500" letter-spacing="0.08em" fill="#57534e" text-anchor="middle">
<text x="70" y="80">THINK</text>
<text x="200" y="80">REVIEW</text>
<text x="330" y="80">BUILD</text>
</g>
<!-- Phase numbers (row 1) -->
<!-- Phase numbers (above labels) -->
<g font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="9" font-weight="500" fill="#a8a29e" text-anchor="middle">
<text x="70" y="64">01</text>
<text x="200" y="64">02</text>
<text x="330" y="64">03</text>
<text x="50" y="100">01</text>
<text x="125" y="100">02</text>
<text x="200" y="100">03</text>
<text x="275" y="100">04</text>
<text x="350" y="100">05</text>
</g>
<!-- Row 1 connector line -->
<line x1="82" y1="110" x2="188" y2="110" stroke="#d97706" stroke-width="1.5" stroke-dasharray="2 3"/>
<line x1="212" y1="110" x2="318" y2="110" stroke="#d97706" stroke-width="1.5" stroke-dasharray="2 3"/>
<!-- Row 1 nodes -->
<circle cx="70" cy="110" r="7" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
<circle cx="200" cy="110" r="7" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
<circle cx="330" cy="110" r="7" fill="#d97706"/>
<!-- Elbow: Build (3) -> Ship (4) -->
<path d="M 337 110 Q 360 110 360 150 Q 360 190 337 190" fill="none" stroke="#d97706" stroke-width="1.5" stroke-dasharray="2 3"/>
<!-- Row 2 connector line -->
<line x1="318" y1="190" x2="212" y2="190" stroke="#d97706" stroke-width="1.5" stroke-dasharray="2 3"/>
<line x1="188" y1="190" x2="82" y2="190" stroke="#d97706" stroke-width="1.5" stroke-dasharray="2 3"/>
<!-- Row 2 nodes -->
<circle cx="330" cy="190" r="7" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
<circle cx="200" cy="190" r="7" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
<circle cx="70" cy="190" r="7" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
<!-- Phase labels (row 2, below nodes) -->
<!-- Phase labels (above nodes) -->
<g font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="11" font-weight="500" letter-spacing="0.08em" fill="#57534e" text-anchor="middle">
<text x="330" y="222">SHIP</text>
<text x="200" y="222">MAINTAIN</text>
<text x="70" y="222">SETUP</text>
<text x="50" y="120">INVESTIGATE</text>
<text x="125" y="120">DESIGN</text>
<text x="200" y="120">IMPLEMENT</text>
<text x="275" y="120">VERIFY</text>
<text x="350" y="120">SHIP</text>
</g>
<!-- Phase numbers (row 2) -->
<g font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="9" font-weight="500" fill="#a8a29e" text-anchor="middle">
<text x="330" y="238">04</text>
<text x="200" y="238">05</text>
<text x="70" y="238">06</text>
</g>
<!-- Connector lines between nodes (dashed) -->
<line x1="60" y1="150" x2="115" y2="150" stroke="#d97706" stroke-width="1.5" stroke-dasharray="2 3"/>
<line x1="135" y1="150" x2="190" y2="150" stroke="#d97706" stroke-width="1.5" stroke-dasharray="2 3"/>
<line x1="210" y1="150" x2="265" y2="150" stroke="#d97706" stroke-width="1.5" stroke-dasharray="2 3"/>
<line x1="285" y1="150" x2="340" y2="150" stroke="#d97706" stroke-width="1.5" stroke-dasharray="2 3"/>
<!-- Nodes (Verify is filled — load-bearing phase for verification-first identity) -->
<circle cx="50" cy="150" r="7" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
<circle cx="125" cy="150" r="7" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
<circle cx="200" cy="150" r="7" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
<circle cx="275" cy="150" r="7" fill="#d97706"/>
<circle cx="350" cy="150" r="7" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
<!-- Sub-tagline under spine -->
<text x="200" y="195" font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="10" font-weight="400" letter-spacing="0.08em" fill="#78716c" text-anchor="middle">VERIFICATION-FIRST ENGINEERING TOOLKIT</text>
<!-- Footer inscription -->
<text x="200" y="270" font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="10" font-weight="400" letter-spacing="0.12em" fill="#a8a29e" text-anchor="middle">35 SKILLS · 24 AGENTS · 7 MODES</text>
<text x="200" y="245" font-family="'IBM Plex Mono', ui-monospace, monospace" font-size="10" font-weight="400" letter-spacing="0.12em" fill="#a8a29e" text-anchor="middle">15 SKILLS · 8 AGENTS · 5 OUTPUT STYLES</text>
</svg>

Before

Width:  |  Height:  |  Size: 3.1 KiB

After

Width:  |  Height:  |  Size: 2.7 KiB

@@ -1,11 +1,11 @@
---
title: Creating Agents & Modes
description: How to create custom agents and behavioral modes for Claude Kit.
title: Creating Agents & Output Styles
description: How to create custom agents and output styles for Claude Kit.
---
# Creating Agents & Modes
# Creating Agents & Output Styles
Beyond skills, you can create specialized agents for focused tasks and behavioral modes for different work contexts.
Beyond skills, you can create specialized agents for focused tasks and output styles for different work contexts.
---
@@ -97,116 +97,119 @@ Return a safety report:
---
## Creating Modes
## Creating Output Styles
Modes change Claude's communication style, output format, and problem-solving approach for the duration of a session.
[Output styles](https://docs.claude.com/en/docs/claude-code/output-styles) are Claude Code's native mechanism for changing communication style, output format, and problem-solving posture for an entire session. Claude Kit ships 5 (see the [Output Styles Reference](/reference/output-styles/)); custom ones live alongside.
### Mode Structure
### Where to put them
After running `/claudekit:init`, built-in modes are installed to `.claude/modes/`. You can add custom modes alongside them:
Three locations, in override order (most specific wins):
```
.claude/modes/
├── brainstorm.md # Installed by /claudekit:init
├── implementation.md # Installed by /claudekit:init
└── my-custom-mode.md # Your custom mode
.claude/output-styles/ # Project-specific (checked-in or local)
~/.claude/output-styles/ # Personal (your machine, all projects)
<plugin-root>/output-styles/ # Plugin-shipped (claudekit's 5)
```
### Mode File Format
### File format
```markdown
---
name: my-mode
description: One-line description of this mode's behavior.
name: My Style
description: A short description shown in the /config picker.
keep-coding-instructions: true
---
# My Mode
# My Style
## Communication Style
[How Claude should communicate in this mode]
## Output Format
[What outputs should look like]
## Problem-Solving Approach
[How Claude should approach tasks]
## When to Use
[Best scenarios for this mode]
[behavioral instructions — written as a system-prompt overlay]
```
### Example: Custom Mode
### Frontmatter fields
| Field | Required | Description |
|-------|----------|-------------|
| `name` | No (inherits from filename) | Display name in `/config` |
| `description` | Yes | One-line description shown in the picker |
| `keep-coding-instructions` | No (default `false`) | If `true`, preserves Claude's default coding/testing/verification instructions and adds yours on top. If `false`, your content fully replaces them. |
For engineering workflows, default to `keep-coding-instructions: true`. Use `false` only for non-engineering contexts (writing, analysis).
### Example: pair-programming style
```markdown
---
name: pair-programming
description: Interactive pair programming mode with frequent check-ins.
name: Pair Programming
description: Interactive pair programming frequent check-ins, small chunks, discuss before deciding.
keep-coding-instructions: true
---
# Pair Programming Mode
# Pair Programming
## Communication Style
- Think out loud — explain reasoning as you code
- Ask before making non-obvious decisions
- Suggest alternatives when multiple approaches exist
- Keep explanations conversational, not formal
You are pair-programming with the user. They want to be involved in decisions, not handed a finished implementation.
## Output Format
- Show code in small chunks (10-20 lines)
- Pause after each chunk for feedback
- Use comments to explain "why", not "what"
## Posture
## Problem-Solving Approach
- Start with the simplest approach
- Refactor only when the user agrees
- Test each change before moving on
- Never make large changes without discussion
- Think out loud. Explain reasoning as you code.
- Ask before non-obvious choices. Don't decide the file structure or pattern unilaterally.
- Show code in 10-20 line chunks. Pause for feedback after each chunk.
- Suggest 1-2 alternatives when multiple approaches exist.
## When to Use
- Learning a new codebase together
- Complex features where design decisions need discussion
- Mentoring or teaching scenarios
## Output format
For each chunk:
1. Brief explanation of what you're about to add (1 sentence).
2. The chunk (10-20 lines).
3. "Continue?" or a clarifying question.
## What you DON'T do
- Don't ship 200 lines without checking in.
- Don't refactor adjacent code "while you're there."
- Don't pick a library or pattern the user hasn't seen before without discussing it first.
```
### Example: Compliance Mode
### Example: compliance style
```markdown
---
name: compliance
description: Strict compliance mode for regulated industries.
name: Compliance
description: Strict compliance posture — formal language, audit trails, security-first.
keep-coding-instructions: true
---
# Compliance Mode
# Compliance
## Communication Style
- Formal, precise language
- Reference specific regulations when relevant
- Flag compliance risks proactively
You are working in a regulated environment. Every decision is documented; every shortcut is flagged.
## Output Format
- Include audit trail comments in code
- Document all security decisions
- Generate compliance checklists
## Posture
## Problem-Solving Approach
- Security and compliance over convenience
- Prefer established patterns over novel solutions
- Require explicit approval for any data handling changes
- Formal, precise language. No idioms.
- Reference specific regulations or controls when relevant (HIPAA, PCI-DSS, SOC 2, etc.).
- Flag compliance risks proactively, even if not asked.
- Require explicit approval for any change that touches PII, audit logs, or access controls.
## Output format
- Include audit trail comments in code (`// COMPLIANCE: <reason>`).
- Document security decisions inline.
- For changes touching regulated data paths, generate a one-line compliance note in the PR description.
```
## Activating Custom Modes
## Activating custom output styles
Once created, switch to your mode naturally:
Switch via `/config` (the style appears in the picker once the file exists in any of the three locations) or by setting `outputStyle` directly in `.claude/settings.local.json`:
```
"switch to pair-programming mode"
"use compliance mode"
```json
{
"outputStyle": "Pair Programming"
}
```
Or reference the mode-switching skill keywords.
The choice persists across sessions until changed.
## Related Pages
- [Agents Reference](/reference/agents/) — All 24 built-in agents
- [Modes Reference](/reference/modes/) — All 7 built-in modes
- [Agents Reference](/reference/agents/) — The 8 built-in agents
- [Output Styles Reference](/reference/output-styles/) — The 5 built-in output styles
- [Creating Skills](/customization/creating-skills/) — Custom skill creation
@@ -153,7 +153,7 @@ description: Use when deploying to Fly.io or configuring Fly.io
- Setting up Fly.io machines or volumes
## When NOT to Use
- Deploying to other platforms (use devops skill instead)
- Deploying to other platforms (this skill is Fly.io-specific)
---
@@ -125,5 +125,5 @@ You can customize agent behavior in your CLAUDE.md:
## Next Steps
- [Workflows](/workflows/planning-and-building/) — See how skills work together
- [Skills Reference](/reference/skills/) — Browse all 35 skills
- [Skills Reference](/reference/skills/) — Browse all 15 skills
- [Creating Skills](/customization/creating-skills/) — Build your own
@@ -26,7 +26,7 @@ Claude Kit installs as a Claude Code plugin via a marketplace. Setup takes under
/plugin install claudekit
```
That's it — all 35 skills and 24 agents are now available. Skills auto-trigger based on context; the 13 spine skills can also be typed as `/claudekit:<skill-name>`, and agents can be dispatched as `claudekit:<agent-name>`.
That's it — all 15 skills and 8 agents are now available. Skills auto-trigger based on context; the 13 spine skills can also be typed as `/claudekit:<skill-name>`, and agents can be dispatched as `claudekit:<agent-name>`.
### Step 3: Configure Your Project (Optional)
@@ -67,16 +67,16 @@ After installing, skills trigger automatically based on your conversation:
```
You: "I need to add user authentication to our app"
→ triggers: claudekit:brainstorming, claudekit:writing-plans
→ triggers: claudekit:shape-spec, claudekit:write-plan
You: "There's a TypeError in the UserService"
→ triggers: claudekit:systematic-debugging
→ triggers: claudekit:investigate-root-cause
```
You can also invoke skills manually:
```
/claudekit:brainstorming
/claudekit:shape-spec
/claudekit:init
```
@@ -1,66 +1,77 @@
---
title: Introduction
description: Learn what Claude Kit is and how it accelerates your development workflow.
description: A verification-first engineering toolkit for Claude Code. Built for senior ICs and tech leads.
---
# Introduction to Claude Kit
Claude Kit is an open-source Claude Code plugin that transforms Claude Code into a production-ready AI development team. It provides auto-triggered skills, specialized agents, and an interactive setup wizard that accelerates your development workflow.
Claude Kit is a Claude Code plugin that adds a **verification-first engineering workflow** — every claim has evidence, every step has a checkpoint, every skill has a Rationalizations table that names the excuses an engineer makes to skip discipline. Built for senior ICs and tech leads who already know how to ship and want a workflow that keeps the bar high without ceremony.
## What is Claude Kit?
Claude Kit is a Claude Code plugin you install via a marketplace:
A Claude Code plugin you install via the marketplace:
- **35 Skills** — Organized around a 6-phase development workflow. 13 user-invocable spine skills (typed as `/claudekit:<name>`) plus 22 supporting skills that auto-trigger by context
- **24 Agents** — Specialized subagents for focused tasks (code review, security audit, database design, plan review, etc.)
- **7 Modes** — Behavioral configurations installed via `/claudekit:init`
- **Setup Wizard** — `/claudekit:init` scaffolds rules, modes, hooks, and MCP servers into your project
- **15 Skills** — A 5-phase spine (Investigate → Design → Implement → Verify → Ship) plus 1 setup skill. All user-invocable as `/claudekit:<name>`. Each skill has 8 required sections including a Rationalizations table and Evidence Requirements.
- **8 Agents** — Specialist subagents, one dispatcher each. No agent-bloat.
- **5 Output Styles** — Native Claude Code output styles shipped with the plugin (Brainstorm, Deep Research, Implementation, Review, Token Efficient). Switch via `/config`.
- **Setup Wizard** — `/claudekit:init` scaffolds rules, modes, hooks, and MCP servers into your project.
Skills activate automatically based on keywords in your conversation. No commands to memorize — just describe what you want to do.
Skills activate automatically based on keywords in your conversation, or invoke directly by name.
## Why Claude Kit?
### The Problem with Raw Claude Code
### The problem with raw Claude Code workflows
| Problem | Symptom |
|---------|---------|
| **Context Spirals** | Token budgets run out, Claude loses track of what it was doing |
| **Inconsistent Output** | Quality varies wildly between sessions |
| **No Structure** | Every session starts from scratch |
| **Missing Expertise** | Claude doesn't know your team's patterns and standards |
| **Self-reported "done"** | "Tests pass — trust me" claims that don't hold up |
| **Symptom patches** | Bugs fixed at the line where the error appeared, not at the cause |
| **Silent skip-it discipline** | Steps elided when the engineer thinks they "see the problem" |
| **Vague plans** | "Implement the X" tasks that hide three sub-decisions nobody made |
### How Claude Kit Helps
### What Claude Kit adds
1. **Auto-Triggered Skills** — Say "fix this bug" and systematic-debugging activates. Say "plan this" and brainstorming kicks in.
2. **Specialized Agents** — Dispatch focused subagents for code review, testing, security audits, and more.
3. **Consistent Quality** — Built-in TDD enforcement, verification before completion, and code review workflows.
4. **Full Customization** — Add your own skills, agents, and modes.
1. **Rationalizations tables** — Every skill names the excuses someone makes to skip a step ("I see the problem, let me just patch it") with rebuttals. The skill refuses to be skipped silently.
2. **Evidence Requirements** — Every checkpoint produces an artifact you could paste into a code review. "It seems right" is failure.
3. **Pre-completion gates**`verification-gate` runs before any "done" claim. Tests run. Negative path checked. Non-IDE environment exercised. Original ask cross-checked.
4. **Plan-review pipeline** — Two parallel reviewers (architecture + experience) score 5 sub-dimensions each, consolidate into one fix gate. Catches structural issues before code.
5. **No founder voice** — No "ambitious vision," no "10x outcomes," no "delight." Engineering analogies, real file paths, real commands.
## How Skills Work
## How skills work
Skills are the core of Claude Kit. They trigger automatically based on keywords:
Skills trigger automatically based on keywords, or you can invoke them directly:
```
You: "I need to add user authentication to our app"
triggers: brainstorming, writing-plans
You: "Why is this endpoint returning 500s?"
triggers: investigate-root-cause
You: "There's a TypeError in the UserService"
triggers: systematic-debugging, root-cause-tracing
You: "How does the auth flow work?"
triggers: map-codebase
You: "Let's write tests for the API endpoints"
triggers: testing, test-driven-development
You: "Plan the migration to PostgreSQL"
triggers: shape-spec, then write-plan, then plan-review
You: "Is this PR ready to merge?"
→ triggers: verification-gate, then code-review-loop
```
No slash commands needed — Claude reads your intent and activates the right skills.
Or invoke directly: `/claudekit:investigate-root-cause`, `/claudekit:plan-review`, `/claudekit:verification-gate`.
## Who is Claude Kit For?
## Who is Claude Kit for?
- **Solo developers** who want to ship faster
- **Small teams (1-3 developers)** working on multi-stack projects
- **Anyone using Claude Code** who wants more structure and consistency
- **Senior ICs** who want a workflow that respects how they already think — not founder-flavored coaching, not "magical AI" framing.
- **Tech leads** running plan reviews, code reviews, and engineering rigor across teams. Plan-review is the headline workflow.
- **Anyone using Claude Code** who's tired of self-reported "done" claims and wants a discipline that produces evidence.
## Next Steps
## What Claude Kit isn't for
1. [Install Claude Kit](/getting-started/installation/) — Install the plugin
2. [Configuration](/getting-started/configuration/) — Run `/claudekit:init` to customize
3. [Skills Reference](/reference/skills/) — Browse all 35 skills
- Pure exploratory work where the goal is learning, not shipping.
- One-line typo fixes that don't need a workflow.
- Strategy / scope / "is this worth building" questions — that's a different lane.
## Next steps
1. [Install Claude Kit](/getting-started/installation/) — Install the plugin from the marketplace.
2. [Configuration](/getting-started/configuration/) — Run `/claudekit:init` to scaffold rules, modes, hooks, and MCP servers.
3. [Skills Reference](/reference/skills/) — Browse the 16 skills.
4. [Agents Reference](/reference/agents/) — Browse the 8 specialist agents.
+28 -24
View File
@@ -1,9 +1,9 @@
---
title: Claude Kit
description: The development-workflow plugin for Claude Code. 35 skills across a 6-phase workflow, 24 agents, 7 modes — install as a plugin and go. Free forever.
description: A verification-first engineering toolkit for Claude Code. 15 skills, 8 agents, 5 output styles — every claim has evidence. For senior ICs and tech leads.
template: splash
hero:
tagline: A development-workflow plugin for Claude Code. 35 skills across a 6-phase spine — Think, Review, Build, Ship, Maintain, Setup — plus 24 agents and 7 modes. Free, open source.
tagline: A verification-first engineering toolkit for Claude Code. 15 skills across a 5-phase spine — Investigate, Design, Implement, Verify, Ship — plus 8 specialist agents and 5 output styles. Built for senior ICs and tech leads.
image:
dark: ../../assets/hero-dark.svg
light: ../../assets/hero-light.svg
@@ -20,23 +20,27 @@ hero:
import { Card, CardGrid, LinkCard } from '@astrojs/starlight/components';
## What makes claudekit different
Every skill has a **Rationalizations table** — the excuses an engineer makes to skip a step ("I see the problem, let me just patch it") with rebuttals. Every checkpoint has **Evidence Requirements** — a specific artifact you could paste into a code review. **Pre-completion gates** refuse "tests pass — trust me" claims. No founder voice; engineering-only rigor.
## Four layers, one plugin
<CardGrid>
<LinkCard
title="35 Skills"
description="A 6-phase workflow spine. 13 user-invocable spine skills, plus 22 that auto-trigger by context."
title="15 Skills"
description="A 5-phase spine. All user-invocable. Each skill has rationalizations + evidence + red flags."
href="/reference/skills/"
/>
<LinkCard
title="24 Agents"
description="Specialized subagents for code review, testing, database design, security audits, plan review, and more."
title="8 Agents"
description="One specialist per job. Planner, architect, experience-reviewer, investigator, tester, code-reviewer, security-auditor, scout."
href="/reference/agents/"
/>
<LinkCard
title="7 Modes"
description="Behavioral configurations — brainstorm, implementation, review, deep-research, and more."
href="/reference/modes/"
title="5 Output Styles"
description="Native Claude Code output styles — Brainstorm, Deep Research, Implementation, Review, Token Efficient. Switch via /config."
href="/reference/output-styles/"
/>
<LinkCard
title="MCP Servers"
@@ -58,29 +62,29 @@ import { Card, CardGrid, LinkCard } from '@astrojs/starlight/components';
/claudekit:init
```
Skills trigger automatically based on what you're doing. Ask Claude to brainstorm a feature, write a plan, debug an error, or review code — the right skills activate without any commands.
Skills trigger automatically based on what you're doing. Ask Claude to shape a spec, write a plan, investigate a bug, or review code — the right skills activate without any commands.
## Why Claude Kit
Raw Claude Code is powerful but brittle. Long sessions spiral, output quality drifts between runs, every session starts from scratch, and Claude has no built-in sense of your team's patterns.
Raw Claude Code is powerful but brittle. Self-reported "done" claims don't hold up. Bugs get patched at the line where the error appeared, not at the cause. Plans say "implement the X" and hide three sub-decisions nobody made.
| Problem | What Claude Kit adds |
| Problem | What Claude Kit v4 adds |
|---------|----------------------|
| Context spirals | Fresh subagents per task; structured 6-phase flow |
| Inconsistent output | TDD enforcement + verification-before-completion gates |
| No structure | Skills that auto-trigger on intent, not slash commands |
| Missing expertise | 24 specialized agents; project-level rules and modes |
| Self-reported "done" claims | `verification-gate` — pre-completion check that requires pasted evidence |
| Symptom-fixed bugs | `investigate-root-cause` — 4-phase, no fix without a written hypothesis |
| Vague plans | `write-plan` — file paths, exact test commands, falsifiable acceptance per task |
| Skip-it discipline | Rationalizations tables in every skill — the excuses get named with rebuttals |
## The 6-phase workflow
## The 5-phase workflow
| Phase | What happens | Example skills |
| Phase | What happens | Skills |
|-------|-------------|----------------|
| **Think** | Clarify the problem, pick an approach | `brainstorming`, `writing-plans` |
| **Review** | Pressure-test the plan before coding | `autoplan`, `plan-eng-review`, `plan-ceo-review` |
| **Build** | Execute with TDD, dispatch subagents, verify | `feature-workflow`, `test-driven-development` |
| **Ship** | Commit, review, PR, changelog | `finishing-a-development-branch`, `git-workflows` |
| **Maintain** | Debug, refactor, trace root causes | `systematic-debugging`, `root-cause-tracing` |
| **Setup** | Install rules, modes, hooks, MCP | `init` |
| **Investigate** | Surface every fact about the system, with file:line citations | `investigate-root-cause`, `map-codebase`, `audit-dependencies` |
| **Design** | Spec → plan → reviewed before implementation | `shape-spec`, `write-plan`, `plan-review`, `plan-review-architecture`, `plan-review-experience` |
| **Implement** | Red-green-refactor; vertical slices behind feature flags | `test-first`, `incremental-shipping` |
| **Verify** | Mandatory pre-completion gate; active-debug paper trail | `verification-gate`, `evidence-driven-debugging` |
| **Ship** | Reviewable PRs with verification evidence; atomic releases | `code-review-loop`, `release-and-changelog` |
| **Setup** *(off-spine)* | One-time scaffolding wizard for project config | `init` |
[Explore the workflows →](/workflows/planning-and-building/)
+35 -102
View File
@@ -1,129 +1,62 @@
---
title: Agents Reference
description: All 24 specialized subagents in Claude Kit.
description: The 8 specialist agents in Claude Kit — each with a single dispatcher and a narrow job.
---
# Agents Reference
Agents are specialized subagents that Claude can dispatch for focused tasks. Each agent has access to specific tools and expertise, making it more effective than a general-purpose prompt for its domain.
Claude Kit ships **8 specialist agents.** Each agent has a single dispatcher (the skill that calls it) and a narrow job. No agent-bloat; no orphans.
## How Agents Work
Agents are bundled with the Claude Kit plugin. When Claude dispatches a subagent, it starts a fresh context focused entirely on the task at hand:
When a skill needs deeper, focused work, it dispatches a specialist agent. The agent starts in a fresh context, does the focused job, and returns a structured result to the main conversation.
```
You: "Review this code for security issues"
Claude dispatches → security-auditor agent
Focused security review
Returns findings with severity ratings
→ /claudekit:code-review-loop dispatches
claudekit:security-auditor (sensitive path detected)
Focused OWASP-aligned review
→ Returns findings with severity + OWASP category
```
Agents run independently and return results to the main conversation. They can be dispatched in parallel for independent tasks.
Agents can be dispatched in parallel — `plan-review` runs `architect` and `experience-reviewer` simultaneously.
---
## Planning & Research
## The 8 agents
| Agent | Description | Use When |
|-------|-------------|----------|
| **planner** | Designs implementation plans, identifies critical files, considers trade-offs | Planning complex features or migrations |
| **brainstormer** | Explores solutions, evaluates architectures, debates technical decisions | Evaluating options before implementation |
| **researcher** | Comprehensive research on technologies, libraries, and best practices | Need in-depth comparison or analysis |
## Code Quality
| Agent | Description | Use When |
|-------|-------------|----------|
| **code-reviewer** | Reviews code for quality, security, performance, and maintainability | After implementing features, before PRs |
| **tester** | Runs test suites, analyzes coverage, validates error handling, verifies builds | After code changes, checking coverage |
| **debugger** | Investigates issues, analyzes system behavior, traces root causes | Debugging test failures or production bugs |
## Security
| Agent | Description | Use When |
|-------|-------------|----------|
| **security-auditor** | Security audits, OWASP compliance, code vulnerability review | Before production release, security review |
| **vulnerability-scanner** | Automated dependency scanning for known CVEs | Checking for dependency vulnerabilities |
## Infrastructure & Data
| Agent | Description | Use When |
|-------|-------------|----------|
| **database-admin** | Schema design, migrations, query optimization, data modeling | Database work for PostgreSQL or MongoDB |
| **cicd-manager** | CI/CD pipeline management, deployment automation | Setting up or fixing CI pipelines |
| **pipeline-architect** | Pipeline architecture design and build optimization | Redesigning slow CI/CD pipelines |
## Content & Documentation
| Agent | Description | Use When |
|-------|-------------|----------|
| **docs-manager** | API docs, READMEs, code comments, technical specifications | Documentation needs updating |
| **copywriter** | Marketing copy, release notes, changelogs, product descriptions | User-facing content creation |
| **journal-writer** | Development journals, decision logs, incident documentation | Recording failures or key decisions |
## Design & UI
| Agent | Description | Use When |
|-------|-------------|----------|
| **ui-ux-designer** | Design mockups to code, UI components, responsive/accessible layouts | Building or fixing UI components |
| **api-designer** | RESTful/GraphQL API design, OpenAPI specifications | Designing new APIs |
## Project Management
| Agent | Description | Use When |
|-------|-------------|----------|
| **project-manager** | Progress tracking, roadmaps, task monitoring, status reports | Checking project progress |
| **git-manager** | Stage, commit, push with conventional commits | Git operations |
## Exploration
| Agent | Description | Use When |
|-------|-------------|----------|
| **scout** | Rapidly maps internal codebase — files, patterns, dependencies | Finding code locations, understanding structure |
| **scout-external** | Explores external resources, APIs, open-source projects | Researching external APIs or libraries |
## Plan Review
Dispatched by the `plan-*-review` and `autoplan` skills to score a written implementation plan on 5 dimensions (0-10) with concrete fixes. Read-only — reviewers propose, the skill applies.
| Agent | Description | Use When |
|-------|-------------|----------|
| **ceo-reviewer** | Strategic/scope review — ambition, problem clarity, wedge focus, demand reality, future-fit | Pressure-testing a plan's scope and ambition before implementation |
| **eng-reviewer** | Architecture review — data flow, failure modes, edge cases, test matrix, rollback | Locking in architecture before code is written |
| **design-reviewer** | UX/visual plan review — hierarchy, consistency, states, accessibility, polish vs AI slop | Plans with UI surfaces needing a designer's-eye critique |
| **devex-reviewer** | Developer-experience review — TTHW, ergonomics, error copy, docs, magical moments | Plans shipping APIs, CLIs, SDKs, or docs |
| Agent | Job | Dispatched by |
|-------|-----|---------------|
| **claudekit:planner** | Decompose specs into executable plans (file paths, exact test commands, acceptance criteria, Risks section) | `write-plan` |
| **claudekit:architect** | Score architecture dimension of a written plan: data flow, failure modes, edge cases, test matrix, rollback safety | `plan-review-architecture` (via `plan-review`) |
| **claudekit:experience-reviewer** | Score UX + DX dimension: information hierarchy, state coverage, accessibility, DX ergonomics, AI-slop avoidance | `plan-review-experience` (via `plan-review`) |
| **claudekit:investigator** | Root-cause investigation with evidence chain — never guesses, never patches symptoms | `investigate-root-cause`, `evidence-driven-debugging` |
| **claudekit:tester** | Design and write tests with red-green discipline; pastes runner output as evidence | `test-first` |
| **claudekit:code-reviewer** | Pre-merge structural review of diffs: error handling, edge cases, complexity, naming. Defers sensitive paths to security-auditor | `code-review-loop` |
| **claudekit:security-auditor** | OWASP-aligned review of sensitive paths (auth, payments, crypto, sessions, tokens) | `code-review-loop` (sensitive paths only) |
| **claudekit:scout** | Codebase mapping and dependency audits — produces evidence-cited maps with `<file:line>` references for every claim | `map-codebase`, `audit-dependencies` |
---
## Dispatching Agents
## Custom agents
Claude dispatches agents automatically when appropriate. You can also request it explicitly:
You can add project-specific agents in `.claude/agents/`. They follow the same YAML frontmatter format as bundled agents:
```
"Have the security-auditor review the auth module"
"Ask the database-admin to optimize this query"
"Get the code-reviewer to check my changes"
```yaml
---
name: my-agent
description: "When to dispatch this agent..."
tools: Read, Edit, Bash
memory: project
---
You are a [role] who [does what]. Your output is...
```
### Parallel Dispatch
Agent design rules:
For independent tasks, agents run in parallel:
```
You: "Review security, check test coverage, and audit the database schema"
Claude dispatches simultaneously:
→ security-auditor (auth module)
→ tester (coverage analysis)
→ database-admin (schema review)
```
### Agent vs. Skill
| | Skills | Agents |
|---|--------|--------|
| **How** | Auto-trigger by keywords | Dispatched for focused tasks |
| **Context** | Same conversation | Fresh, isolated context |
| **Best for** | Patterns and methodology | Focused independent work |
| **Parallelism** | Sequential | Can run in parallel |
- **One dispatcher per agent.** No orphans. If you can't name the skill that dispatches the agent, the agent shouldn't exist.
- **Narrow job.** An agent that "helps with everything" helps with nothing.
- **Output format specified.** The skill consumes a known format; the agent produces it.
- **Refusal patterns named.** What the agent won't do is as important as what it will.
@@ -19,7 +19,7 @@ MCP servers are configured via `/claudekit:init`, which adds them to your projec
### Context7
**Purpose**: Real-time library documentation lookup
**Purpose**: Real-time library documentation lookup.
Fetches current documentation for any library, framework, or API. Use instead of relying on Claude's training data, which may be outdated.
@@ -32,68 +32,72 @@ Claude fetches current Next.js 15 docs via Context7
**Best for**: API syntax, configuration, version migration, library-specific debugging.
**Setup**: Run `/claudekit:init` and select Context7
**Setup**: Run `/claudekit:init` and select Context7.
---
### Sequential Thinking
**Purpose**: Structured step-by-step reasoning
**Purpose**: Structured step-by-step reasoning with explicit thought chains.
Provides a tool for multi-step analysis with explicit thought chains. Used automatically by the sequential-thinking skill for complex problems.
Provides a tool for multi-step analysis where each step has a confidence score and the chain can revise earlier steps as new evidence comes in.
```
Complex debugging scenario:
Step 1: Observe the error → confidence: 0.9
Step 2: Form hypothesis → confidence: 0.7
Step 3: Test hypothesis → confidence: 0.85
Step 4: Verify fix → confidence: 0.95
Investigation:
Step 1: Capture the error → confidence: 0.9
Step 2: Form hypothesis (X causes Y when Z) → confidence: 0.7
Step 3: Test hypothesis with instrumentation → confidence: 0.85
Step 4: Verify the fix doesn't regress → confidence: 0.95
```
**Best for**: Complex debugging, architecture decisions, security analysis.
**Best for**: Complex debugging, architectural trade-off analysis, security review where multiple hypotheses need to be tracked simultaneously.
**Setup**: Run `/claudekit:init` and select Sequential Thinking
**Setup**: Run `/claudekit:init` and select Sequential Thinking.
---
### Memory
**Purpose**: Persistent knowledge graph across sessions
**Purpose**: Persistent knowledge graph across sessions.
Stores entities, relationships, and observations that persist across conversations. Claude can recall project decisions, user preferences, and architectural context.
Stores entities, relationships, and observations that persist across conversations. Claude can recall project decisions, user preferences, and architectural context the next time you sit down.
```
Session 1: "We decided to use PostgreSQL RLS for multi-tenancy"
→ Stored as entity + decision observation
Session 2: "What did we decide about multi-tenancy?"
Session 2 (a week later): "What did we decide about multi-tenancy?"
→ Retrieved from memory graph
```
**Best for**: Long-running projects, team knowledge persistence, decision tracking.
**Best for**: Long-running projects, decision tracking, building up codebase knowledge over time.
**Setup**: Run `/claudekit:init` and select Memory.
---
### Filesystem
**Purpose**: Secure file operations with access controls
**Purpose**: Sandboxed file operations with configurable allowed directories.
Provides sandboxed file operations with configurable allowed directories. Useful for projects that need restricted file access.
Useful for projects with strict file access requirements (e.g., when you want Claude restricted to a specific subtree of a monorepo, or when you're operating in a regulated environment with audit-trail requirements).
**Best for**: Projects with strict file access requirements.
**Best for**: Projects with strict file access requirements; regulated codebases.
**Setup**: Run `/claudekit:init` and select Filesystem.
---
### Playwright
**Purpose**: Browser automation for testing
**Purpose**: Browser automation for testing and verification.
Enables Claude to control a browser for E2E testing, visual verification, and web scraping. Works with the playwright skill for end-to-end test workflows.
Enables Claude to control a real browser for E2E testing, visual verification, and runtime UI checks.
```
You: "Test the login flow in the browser"
You: "Verify the login flow works in production"
Claude launches browser via Playwright MCP:
Claude launches a browser via Playwright MCP:
→ Navigate to /login
→ Fill email and password
→ Click submit
@@ -101,7 +105,9 @@ Claude launches browser via Playwright MCP:
→ Take screenshot for evidence
```
**Best for**: E2E testing, visual regression, browser-based verification.
**Best for**: E2E testing, visual regression checks, the non-IDE verification step in `verification-gate`.
**Setup**: Run `/claudekit:init` and select Playwright.
---
@@ -127,11 +133,16 @@ Or install all servers at once:
The wizard automatically detects your platform and configures the correct command format in `.mcp.json`. Restart Claude Code after configuration.
## Skills That Use MCP
---
| MCP Server | Skills That Benefit |
|------------|-------------------|
| Context7 | All framework/library lookups (fetches current docs for any library) |
| Sequential | sequential-thinking, systematic-debugging, brainstorming |
| Memory | session-management, brainstorming (persisting design decisions) |
| Playwright | playwright, verification-before-completion |
## Which skills benefit from each server
| MCP Server | Skills that get the most lift |
|------------|------------------------------|
| Context7 | `audit-dependencies` (verify advisories against current docs), `investigate-root-cause` (confirm framework behavior matches docs), `shape-spec` (research library options before committing), `incremental-shipping` (read changelog before bumping a dep) |
| Sequential Thinking | `investigate-root-cause` (the 4-phase loop benefits from explicit confidence tracking), `plan-review-architecture` (multi-dimensional scoring), `shape-spec` (working through alternatives systematically) |
| Memory | `shape-spec` (recall design decisions across sessions), `map-codebase` (build up codebase knowledge over time), `release-and-changelog` (recall release history) |
| Playwright | `test-first` (E2E test cases for UI flows), `verification-gate` (the non-IDE verification step — exercising the change in a real browser) |
| Filesystem | Project-wide; no specific skill mapping. Use when you need scoped file access. |
MCP servers are optional — claudekit's spine works without them. They add capability where they fit; the skills enforce discipline regardless.
-177
View File
@@ -1,177 +0,0 @@
---
title: Modes Reference
description: All 7 behavioral modes in Claude Kit.
---
# Modes Reference
Modes change how Claude communicates and solves problems. Each mode optimizes behavior for a specific type of task.
## How Modes Work
Switch modes naturally in conversation:
```
"switch to brainstorm mode"
"use implementation mode"
"go into review mode"
```
Modes are installed into your project's `.claude/modes/` via `/claudekit:init`. Each defines communication style, output format, and problem-solving approach.
---
## Available Modes
### Default
The standard balanced mode for general tasks.
- **Communication**: Clear, helpful, balanced detail
- **Output**: Mix of explanation and code
- **Best for**: General development tasks, questions, exploration
---
### Brainstorm
Creative exploration for design and ideation.
- **Communication**: Asks lots of questions, explores alternatives
- **Output**: Options with trade-offs, diagrams, decision matrices
- **Best for**: Feature design, architecture decisions, requirement exploration
**Example**:
```
You: "switch to brainstorm mode"
You: "I need to add search to our product catalog"
Claude asks one question at a time:
"What search complexity do you need?
a) Simple text matching (LIKE queries)
b) Full-text search (PostgreSQL tsvector)
c) Dedicated search engine (Elasticsearch/Meilisearch)"
```
---
### Implementation
Code-focused execution with minimal prose.
- **Communication**: Terse, action-oriented
- **Output**: Mostly code, minimal explanation
- **Best for**: Executing known tasks, coding from clear specs
**Example**:
```
You: "switch to implementation mode"
You: "add a PATCH /api/users/:id endpoint"
Claude writes code immediately with minimal commentary.
```
---
### Review
Critical analysis for code review and quality assurance.
- **Communication**: Critical, thorough, finds issues
- **Output**: Issue lists with severity, suggestions, security flags
- **Best for**: Code review, QA, pre-merge checks
**Example**:
```
You: "switch to review mode"
You: "review the auth middleware"
Claude examines code critically:
"CRITICAL: Token expiry not checked after decode (line 42)
IMPORTANT: Missing rate limiting on login endpoint
MINOR: Inconsistent error response format"
```
---
### Token-Efficient
Compressed output for high-volume work and cost optimization.
- **Communication**: Minimal prose, maximum density
- **Output**: Code-only when possible, compressed explanations
- **Best for**: Long sessions, repetitive tasks, cost-conscious work
- **Savings**: 30-70% token reduction
**Levels**:
| Level | How to Activate | Savings |
|-------|----------------|---------|
| Concise | "be concise" | 30-40% |
| Ultra | "code only" | 60-70% |
| Session | "switch to token-efficient mode" | 30-70% |
---
### Deep Research
Thorough investigation with evidence and citations.
- **Communication**: Detailed analysis, cites sources
- **Output**: Structured reports, evidence-backed conclusions
- **Best for**: Technology evaluation, incident investigation, audits
**Example**:
```
You: "switch to deep research mode"
You: "analyze our authentication flow for security issues"
Claude produces a structured report:
"## Findings
### 1. Session Token Storage (High Risk)
Current: localStorage (vulnerable to XSS)
Recommended: httpOnly cookie
Evidence: OWASP Session Management Cheat Sheet..."
```
---
### Orchestration
Multi-agent coordination for complex parallel work.
- **Communication**: Status-oriented, progress tracking
- **Output**: Agent dispatch summaries, consolidated results
- **Best for**: Large tasks requiring multiple agents working in parallel
**Example**:
```
You: "switch to orchestration mode"
You: "audit the entire API layer"
Claude coordinates multiple agents:
"Dispatching 3 agents in parallel:
→ security-auditor: reviewing auth endpoints
→ code-reviewer: reviewing business logic
→ tester: checking coverage gaps
Results consolidated in ~2 minutes..."
```
---
## Mode Comparison
| Mode | Verbosity | Focus | Output Style |
|------|-----------|-------|-------------|
| Default | Medium | Balanced | Explanation + code |
| Brainstorm | High | Exploration | Questions + options |
| Implementation | Low | Execution | Code-first |
| Review | Medium | Quality | Issue lists |
| Token-Efficient | Minimal | Density | Compressed |
| Deep Research | High | Analysis | Reports |
| Orchestration | Medium | Coordination | Status + results |
## Customizing Modes
After running `/claudekit:init`, mode files are markdown in `.claude/modes/`. You can edit the installed modes or create new ones. See [Creating Agents & Modes](/customization/creating-agents-and-modes/) for details.
@@ -0,0 +1,139 @@
---
title: Output Styles Reference
description: 5 native Claude Code output styles shipped with Claude Kit.
---
# Output Styles Reference
Claude Kit ships 5 [Claude Code output styles](https://docs.claude.com/en/docs/claude-code/output-styles) — system-prompt overlays that change how Claude communicates and reasons for the entire session. Output styles are auto-discovered when the plugin is installed; no `/claudekit:init` step required.
All 5 styles use `keep-coding-instructions: true`, so Claude's default coding/testing/verification discipline still applies underneath. The style adds posture and format on top.
## Switching styles
### Via `/config` (recommended)
```
/config
```
Pick **Output style** from the menu, then choose one of the 5 styles. The choice persists across sessions.
### Via settings file
Edit `.claude/settings.local.json` (project) or `~/.claude/settings.json` (personal):
```json
{
"outputStyle": "Brainstorm"
}
```
### Built-in vs claudekit styles
Claude Code has built-in styles (`Default`, `Explanatory`, `Learning`). Claudekit adds 5 more: `Brainstorm`, `Deep Research`, `Implementation`, `Review`, `Token Efficient`. They appear together in the `/config` picker.
---
## The 5 styles
### Brainstorm
Creative exploration mode — divergent thinking, multiple alternatives, structured trade-offs before any code.
- **Posture**: Diverge first, converge second. Surface 2-3 distinct approaches before recommending one.
- **Output format**: Lettered approaches with pros / cons / effort, then a one-line recommendation.
- **Best for**: Feature design, architecture decisions, exploring alternatives.
```
APPROACH A: <name>
Summary: <1 sentence>
Pros: ...
Cons: ...
Effort: <S/M/L/XL>
APPROACH B: <name>
...
RECOMMENDATION: <which one and why>
```
### Deep Research
Thorough investigation mode — completeness over speed, evidence-cited findings, confidence levels named.
- **Posture**: Cite, don't recall. Every claim has a source — `file:line`, doc URL, or command output.
- **Output format**: Structured reports with Question / Method / Findings (with confidence) / Conclusions / Gaps.
- **Best for**: Technology evaluation, incident investigation, security audits, due diligence.
### Implementation
Code-focused execution mode — minimal prose, action-oriented updates, follow established patterns.
- **Posture**: Execute, don't deliberate. The decisions were made upstream.
- **Output format**: Per-file edits with code blocks, then test-run output, then commit.
- **Best for**: Executing approved plans, repetitive tasks, when design is already decided.
```
Creating `src/services/user-service.ts`
[code]
Running tests... ✓ 5 passing
Committing: feat(user): add user service
```
### Review
Critical analysis mode — find issues first, severity-tagged findings, actionable suggestions.
- **Posture**: Find first, fix second. A reviewer's job is to surface issues with concrete `file:line` locations.
- **Output format**: Findings tagged Critical / Important / Minor / Nitpick with file citations.
- **Best for**: Pre-merge code review, security audits, architecture review.
```
### Critical (must fix before merge)
1. **<issue>** — `file:line`
- Problem: ...
- Fix: ...
```
### Token Efficient
Compressed output mode — minimal prose, code-first, no preambles.
- **Posture**: Skip ceremony. No "Sure, I can help" / "Let me explain first" — just do.
- **Output format**: Code blocks with one-line captions; reference docs instead of re-explaining mechanism.
- **Best for**: High-volume sessions, repeated similar tasks, cost-conscious work.
- **Saving**: 40-60% on average vs default verbosity.
---
## Style comparison
| Style | Verbosity | Focus | Output shape |
|-------|-----------|-------|-------------|
| Brainstorm | High | Exploration | Approach tables + trade-offs |
| Deep Research | High | Analysis | Structured reports with citations |
| Implementation | Low | Execution | Code-first per-file blocks |
| Review | Medium | Quality | Severity-tagged issue lists |
| Token Efficient | Minimal | Density | Code with one-line captions |
## Customizing
Output styles are markdown files at the plugin root in `output-styles/`. To customize, copy the file you want to modify into `.claude/output-styles/<name>.md` (project) or `~/.claude/output-styles/<name>.md` (personal). Project styles override personal styles, which override plugin-shipped styles.
Format:
```yaml
---
name: My Custom Style
description: A short description shown in the /config picker
keep-coding-instructions: true
---
# My Custom Style
[behavioral instructions...]
```
Set `keep-coding-instructions: false` if you want to fully replace Claude's default coding discipline (rare; usually leave it `true`).
+49 -98
View File
@@ -1,138 +1,89 @@
---
title: Skills Reference
description: All 35 skills in Claude Kit, organized around the 6-phase development workflow.
description: 16 skills in Claude Kit organized around the 5-phase verification-first workflow.
---
# Skills Reference
Claude Kit is organized around a **6-phase development workflow**. 13 spine skills are user-invocable — typed directly as `/claudekit:<name>` — and 22 supporting skills auto-trigger by context behind the scenes.
Claude Kit is organized around a **5-phase verification-first workflow**: Investigate → Design → Implement → Verify → Ship. All 14 spine skills (plus 2 setup skills) are user-invocable as `/claudekit:<name>`.
Every skill has 8 required sections: Frontmatter, Overview, When to Use, Process, **Rationalizations table**, **Evidence Requirements**, Red Flags, References. The Rationalizations pattern documents the excuses an engineer makes to skip a step (verbatim) with rebuttals. The Evidence Requirements name what artifact each checkpoint must produce.
## How Skills Work
Skills have trigger descriptions with keywords. When your conversation matches, the skill loads automatically:
```
"fix this bug" → systematic-debugging, root-cause-tracing
"plan the feature" → brainstorming, writing-plans
"review my plan" → plan-ceo-review, plan-eng-review
"switch to brainstorm" → mode-switching, brainstorming
"why is this broken?" → investigate-root-cause
"how does X work?" → map-codebase
"plan this feature" → shape-spec, write-plan
"review the plan" → plan-review (dispatches architect + experience reviewer)
"is it done?" → verification-gate
"open a PR" → code-review-loop
"cut a release" → release-and-changelog
```
You can also invoke spine skills directly by typing `/claudekit:<name>`. Project-level skills go in `.claude/skills/`.
You can also invoke any skill directly by typing `/claudekit:<name>`.
---
## 🧠 Think
## 🔍 Investigate
Explore ideas, refine requirements, produce a spec.
Surface every fact about the system before forming a theory. Every claim has a `<file:line>` citation; no memory-based assertions.
| Skill | Description | Triggers On |
|-------|-------------|-------------|
| **brainstorming** | Interactive design — one question at a time. Includes Startup Mode (6 forcing questions) for new product ideas | "brainstorm", "design", "explore", "is this worth building" |
| **writing-plans** | Break a spec into bite-sized tasks with exact code, file paths, and verification commands | "plan", "break down", "task list", "implementation steps" |
| **investigate-root-cause** | 4-phase: gather → hypothesize → test → prove. Mandatory before any fix. | "bug", "error", "broken", "why does this", stack traces |
| **map-codebase** | Methodical evidence-cited exploration. Produces a written map a teammate can read in 3 minutes. | "how does X work", "trace", "find where", "scope of change" |
| **audit-dependencies** | Dependency archaeology — what's actually used vs declared, with import-graph and reachability checks for CVEs. | "deps", "audit", "CVE", "stale package", "do we use" |
## 🔍 Review
## 🎨 Design
Pressure-test a written plan before coding. Each dimension scores 0-10 with a one-sentence rationale and concrete fixes. Selected fixes are written directly into the plan file.
| Skill | Dimensions scored | When to invoke |
|-------|------------------|----------------|
| **autoplan** | All 4 below, parallel fan-out, single consolidated fix gate | Full gauntlet before handoff — "autoplan", "auto review", "run all reviews" |
| **plan-ceo-review** | Ambition, problem clarity, wedge focus, demand reality, future-fit | Scope / strategy pressure-test — "think bigger", "scope review" |
| **plan-eng-review** | Data flow, failure modes, edge cases, test matrix, rollback | Architecture audit — "does this design make sense", "lock in the plan" |
| **plan-design-review** | Hierarchy, visual consistency, state coverage, accessibility, AI-slop avoidance | Plans with UI surfaces — "design critique", "avoid AI slop" |
| **plan-devex-review** | Time to Hello World, ergonomics, error copy, docs structure, magical moments | Plans shipping APIs / CLIs / SDKs — "DX review", "is this SDK ergonomic" |
## 🔨 Build
Implement with discipline — TDD, systematic debugging, and verification gates.
Convert a vague request into a written spec, then a numbered plan, then survive review before implementation begins.
| Skill | Description | Triggers On |
|-------|-------------|-------------|
| **feature-workflow** | End-to-end orchestrator: requirements → plan → review → implement → test → review | "feature", "implement end-to-end" |
| **test-driven-development** | Strict red-green-refactor — no production code without a failing test first | "implement", "add feature", "fix bug", "build" |
| **systematic-debugging** | 4-phase investigation: observe, hypothesize, test, prove | "bug", "error", "broken", stack traces |
| **verification-before-completion** | Mandatory evidence before any completion claim | "done", "fixed", "tests pass" |
| **shape-spec** | One-to-three-page spec with goals, non-goals, constraints, falsifiable acceptance criteria, open questions. Engineering-flavored. | "spec", "what should we build", "design this", "let's add" |
| **write-plan** | Numbered task list with file paths, exact test commands, dependency annotations, acceptance per task, Risks section. | "plan", "break down", "task list", "implementation order" |
| **plan-review** | Orchestrator: dispatches 2 reviewers in parallel, consolidates into one fix gate, applies user-selected fixes. | "review the plan", "is the plan ready", "plan-review" |
| **plan-review-architecture** | Scores 5 sub-dimensions 0-10 (data flow, failure modes, edge cases, test matrix, rollback). | "architecture review", "data flow", "failure modes", "rollback" |
| **plan-review-experience** | Scores 5 sub-dimensions 0-10 (info hierarchy, state coverage, accessibility, DX ergonomics, AI-slop avoidance). | "UX review", "DX review", "API ergonomics", "states", "accessibility" |
## 🎛️ Session
## 🔨 Implement
Ship code with red-green-refactor discipline; vertical slices behind feature flags; refactor with evidence.
| Skill | Description | Triggers On |
|-------|-------------|-------------|
| **mode-switching** | Switch behavioral modes (brainstorm, token-efficient, deep-research, implementation, review) | "mode", "switch to brainstorm" |
| **test-first** | Red-green-refactor with strict evidence requirements. | "implement", "fix bug", "TDD", "write the test first" |
| **incremental-shipping** | Vertical slices behind feature flags plus refactor-with-evidence (test/perf deltas required). | "feature flag", "incremental", "vertical slice", "rollout" |
## ⚙️ Setup
## ✅ Verify
Mandatory pre-completion gate. No "tests pass — trust me." Active debugging keeps a paper trail.
| Skill | Description | Triggers On |
|-------|-------------|-------------|
| **init** | Interactive setup wizard — scaffolds rules, modes, hooks, MCP configs into your project | `/claudekit:init` (user-invocable) |
| **verification-gate** | 6-step pre-completion gate: claim → tests → negative path → non-IDE check → cross-check → sign. | "done", "complete", "ready to merge", "tests pass" |
| **evidence-driven-debugging** | Active-debugging companion to investigate-root-cause: instrument, capture, verdict, clean up. | "debug", "instrument", "log", "trace", "what's happening at runtime" |
## 🚀 Ship
Reviewable PRs with verification evidence pasted; atomic releases with diff-built changelogs.
| Skill | Description | Triggers On |
|-------|-------------|-------------|
| **code-review-loop** | End-to-end review etiquette: requesting and receiving feedback. Dispatches code-reviewer and (on sensitive paths) security-auditor. | "code review", "PR review", "request review", "address comments" |
| **release-and-changelog** | SemVer hygiene plus diff-built changelogs plus atomic release commits plus post-release smoke check. | "release", "version bump", "changelog", "tag", "publish" |
---
## Supporting Skills (auto-trigger, non-user-invocable)
## ⚙️ Setup (off-spine)
These 22 skills activate silently when Claude detects a matching context. You don't invoke them directly — they shape how Claude works within the spine phases above.
Used once for project bootstrap, plus session-level mode switching.
### Execution & Parallelism
| Skill | Description | Triggers On |
|-------|-------------|-------------|
| **init** | Interactive setup wizard — scaffolds rules, hooks, and MCP configs into your project | `/claudekit:init` |
| Skill | Triggers On |
|-------|-------------|
| **executing-plans** | "execute the plan", "run the plan" |
| **subagent-driven-development** | "use subagents", "dispatch agents", parallel task execution |
| **using-git-worktrees** | "worktree", "isolated branch", parallel development |
| **finishing-a-development-branch** | "ship it", "ready to merge", "branch is done" |
| **dispatching-parallel-agents** | 3+ independent failures or tasks |
| **condition-based-waiting** | "wait for", "check status", polling CI pipelines |
### Testing Discipline
| Skill | Triggers On |
|-------|-------------|
| **testing** | pytest, Vitest, Jest — fixtures, mocking, coverage config |
| **playwright** | E2E tests, page objects, visual regression |
| **testing-anti-patterns** | "flaky test", "mock", test review — catches unreliable tests |
### Debug Techniques
| Skill | Triggers On |
|-------|-------------|
| **root-cause-tracing** | Deep bugs where error location differs from bug origin |
| **defense-in-depth** | Data integrity bugs, single-point bypass scenarios |
### Review Etiquette
| Skill | Triggers On |
|-------|-------------|
| **requesting-code-review** | Before PRs, before merging |
| **receiving-code-review** | Review comments, PR feedback |
### Reasoning & Meta
| Skill | Triggers On |
|-------|-------------|
| **sequential-thinking** | Complex decisions needing step-by-step reasoning |
| **writing-concisely** | "be concise", "code only" — 30-70% token savings |
| **writing-skills** | "create a skill", "new skill" |
| **refactoring** | "refactor", "clean up", "simplify" |
### Operations
| Skill | Triggers On |
|-------|-------------|
| **devops** | Docker, GitHub Actions, Cloudflare Workers — CI/CD, deployment |
| **git-workflows** | "commit", "PR", "ship", "changelog" |
| **performance-optimization** | "slow", "optimize", "profiling", N+1 queries, bundle size |
| **session-management** | "checkpoint", "index", "status", context loading |
### Security
| Skill | Triggers On |
|-------|-------------|
| **owasp** | Security review, user input, authentication, CORS, CSP |
---
## Counts
- **Total:** 35 skills
- **Spine (user-invocable):** 13 — brainstorming, writing-plans, autoplan, plan-ceo-review, plan-eng-review, plan-design-review, plan-devex-review, feature-workflow, test-driven-development, systematic-debugging, verification-before-completion, mode-switching, init
- **Supporting (auto-trigger only):** 22
To switch session behavior (Brainstorm, Implementation, Review, etc.), use Claude Code's native [output styles](/reference/output-styles/) instead of a skill — switch via `/config`.
@@ -1,197 +1,140 @@
---
title: Planning & Building
description: How Claude Kit guides you from idea to implementation using brainstorming, planning, and execution skills.
description: How Claude Kit takes you from a vague request to shipped, verified code.
---
# Planning & Building
Claude Kit provides a structured workflow for turning ideas into working code: **Brainstorm > Plan > Review > Execute > Verify**.
The full feature loop: spec → plan → review → implement → verify. Each phase produces an artifact you could paste into a code review.
## The Workflow
## Phase 1: Shape the spec
**Triggers on**: "spec", "what should we build", "design this", "let's add"
`shape-spec` turns a vague request into a written spec a teammate can read in 5 minutes. Goals, non-goals, constraints, falsifiable acceptance criteria, open questions. Engineering-flavored — no founder-mode forcing questions.
```
"I need to add user authentication"
┌─────────────────┐
│ Brainstorming │ Explore requirements, ask questions,
│ │ evaluate approaches, validate design
└────────┬────────┘
┌─────────────────┐
│ Writing Plans │ Break into tasks, exact file paths,
│ │ code samples, verification steps
└────────┬────────┘
┌─────────────────┐
│ Autoplan │ Parallel 4-angle plan review:
│ (optional but │ strategy, architecture, design, DX.
│ recommended) │ Single fix-gate before implementation.
└────────┬────────┘
┌─────────────────┐
│ Executing Plans │ Fresh subagent per task, code review
│ │ between tasks, quality gates
└────────┬────────┘
┌─────────────────┐
│ Verification │ Run tests, verify builds,
│ │ evidence before claims
└─────────────────┘
You: "We need to add idempotency to the charge endpoint"
→ /claudekit:shape-spec
→ asks clarifying questions, one at a time
→ produces docs/claudekit/specs/<basename>-spec.md
```
## Phase 1: Brainstorming
Output is a 1-3 page Markdown spec. The non-goals list is more important than the goals list — non-goals only get pinned down when you write them.
**Triggers on**: "brainstorm", "design", "explore", "what if", "how should we", "trade-offs"
## Phase 2: Write the plan
The brainstorming skill uses a three-phase process:
**Triggers on**: "plan", "break down", "task list", "implementation order"
1. **Understanding** — Asks one question at a time to clarify requirements. Prefers multiple-choice over open-ended questions.
2. **Exploration** — Presents 2-3 approaches with trade-offs, leads with a recommendation.
3. **Design** — Presents the validated design in digestible 200-300 word sections.
### Example
`write-plan` decomposes the spec into a numbered task list. Each task names the file, the change, the test command, the acceptance check, dependency annotations.
```
You: "Let's brainstorm adding real-time notifications"
You: "/claudekit:write-plan"
Claude: "Which delivery mechanism fits your use case best?
a) WebSockets — persistent connection, instant delivery
b) Server-Sent Events — simpler, one-way push
c) Polling — simplest, but higher latency"
You: "b) SSE"
Claude: "## Approach 1: SSE + Redis Pub/Sub (Recommended)
Server pushes events via SSE, Redis coordinates across instances.
- Pros: Simple, works with load balancers, auto-reconnect built in
- Cons: One-way only (client can't push back)
## Approach 2: SSE + PostgreSQL LISTEN/NOTIFY
..."
→ produces docs/claudekit/plans/<basename>-plan.md
```
## Phase 2: Writing Plans
Each task line:
**Triggers on**: "plan", "break down", "implementation steps", "task list"
The writing-plans skill creates detailed implementation plans with:
- Exact file paths for every change
- Complete code samples (not descriptions)
- Verification commands with expected output
- 2-5 minute task granularity
### Plan Structure
```markdown
## Task 1: Create User model with email field
**Files**:
- Create: `src/models/user.ts`
- Test: `src/models/user.test.ts`
**Steps**:
1. Write failing test
2. Verify test fails
3. Implement minimally
4. Verify test passes
5. Commit
```
4. src/handlers/billing/charge.ts — add idempotency-key check before insert.
Test: pytest tests/billing/test_charge.py -k test_idempotency
Acceptance: duplicate request with same key returns the original response, no double charge
Blocked by: 2 (schema migration)
```
## Phase 2.5: Plan Review (Optional but recommended)
Plans without file paths are wishlists; the skill refuses to ship those.
**Triggers on**: "autoplan", "auto review", "review my plan", "think bigger", "does this design make sense", "DX review"
## Phase 3: Plan review
Before jumping into execution, pressure-test the plan from four complementary angles. Each reviewer returns a 0-10 scorecard per dimension and proposes concrete fixes. Fixes are presented in a single multi-select prompt — you pick which ones to apply, and they're written directly into the plan file.
**Triggers on**: "review the plan", "is the plan ready", "plan-review"
`plan-review` orchestrates two parallel reviewers. Each scores 5 sub-dimensions 0-10 and proposes concrete fixes. Findings consolidate into one ranked fix gate.
| Skill | Dimensions scored | When to invoke |
|-------|------------------|----------------|
| `plan-ceo-review` | Ambition, problem clarity, wedge focus, demand reality, future-fit | Plan scope / strategy pressure-test |
| `plan-eng-review` | Data flow, failure modes, edge cases, test matrix, rollback | Architecture audit before coding |
| `plan-design-review` | Hierarchy, visual consistency, states, accessibility, AI-slop avoidance | Plans with UI surfaces |
| `plan-devex-review` | Time to Hello World, ergonomics, error copy, docs structure, magical moments | Plans shipping APIs / CLIs / SDKs |
| `autoplan` | All 4 above, fanned out in parallel, single consolidated fix gate | Full gauntlet before handoff |
| `plan-review-architecture` | Data flow, failure modes, edge cases, test matrix, rollback safety | Architecture audit before coding |
| `plan-review-experience` | Information hierarchy, state coverage, accessibility, DX ergonomics, AI-slop avoidance | Plans with UI or API/CLI surfaces |
| `plan-review` | Both above, dispatched in parallel, consolidated single fix gate | Full review before handoff |
### Example
```
You: "/claudekit:autoplan"
You: "/claudekit:plan-review"
Claude: [dispatches 4 reviewers in parallel]
dispatches architect + experience-reviewer in parallel
# Autoplan Review: 2026-04-24-feature-x-plan
Overall Scores:
CEO: 6.2/10 (lowest: Wedge focus 4/10)
ENG: 7.8/10 (lowest: Rollback 5/10)
DESIGN: 8.4/10
DEVEX: 5.6/10 (lowest: Time to Hello World 3/10)
## Architecture review
- Data flow: 8/10
- Failure modes: 6/10 — Task 4: cache miss path undefined
- Edge cases: 7/10
- Test matrix: 7/10
- Rollback safety: 5/10 — Task 2: destructive migration without rollback
Critical Issues (worst first):
[DEVEX] Time to Hello World: no quickstart specified
[CEO] Wedge focus: covers 3 personas simultaneously
[ENG] Rollback: no undo path for Phase 2 migration
...
## Experience review
- Information hierarchy: 9/10
- State coverage: 6/10 — Task 7: no error state for failed charge
- Accessibility: 8/10
- DX ergonomics: 5/10 — Task 7: error message is "Internal error"
- AI-slop avoidance: 10/10
### Consolidated fixes (ranked)
- [Blocker] Task 2: add rollback procedure (destructive migration)
- [Blocker] Task 4: define cache miss failure path
- [Important] Task 7: define error state + actionable error copy
- [Nice-to-have] ...
> Which fixes to apply? [multi-select]
```
## Phase 3: Executing Plans
## Phase 4: Implement
**Triggers on**: "execute the plan", "run the plan", "implement the plan"
**Triggers on**: "implement", "build", "add feature", "fix bug"
The executing-plans skill runs each task with:
Each task ships with `test-first` (red-green-refactor) and `incremental-shipping` (vertical slices behind feature flags).
- **Fresh subagent per task** — Prevents context pollution
- **Code review between tasks** — Catches issues early
- **Quality gates** — Critical issues must be fixed before proceeding
- **Test first.** Write the failing test, watch it fail for the right reason, make it pass with the smallest change, refactor with the test as safety net. Paste runner output for each step.
- **Vertical slices.** The smallest version of the change that delivers value, gated by a feature flag. Ship dark; ramp on.
- **Refactor with evidence.** Behavior-preserving changes prove preservation with before/after test deltas (and perf numbers if perf-sensitive).
### Execution Flow
## Phase 5: Verify
```
Task 1 → Implement → Review → Fix issues → ✓
Task 2 → Implement → Review → Fix issues → ✓
Task 3 → Implement → Review → Fix issues → ✓
Final comprehensive review → ✓
```
**Auto-triggers on**: completion claims ("done", "fixed", "tests pass", "ready to merge")
## Phase 4: Verification
`verification-gate` is the load-bearing pre-completion check. Six steps, ~5 minutes:
**Auto-triggers on**: completion claims ("done", "fixed", "tests pass")
1. Restate the claim: `<X> is complete because <Y>` (Y must be evidence, not "the code looks right").
2. Run named tests with full output. Paste it.
3. Run the negative path. Capture what happens on invalid input, missing field, network failure, max-size input.
4. Verify in a non-IDE environment. `curl` from a separate shell, not `npm run dev` in your editor.
5. Cross-check the original ask. Re-read the ticket; matrix what was asked to where it was addressed.
6. Sign the gate. Add a `## Verification` section to the PR with all of the above.
The verification-before-completion skill requires evidence before any completion claim:
If the runner output isn't pasted, the gate hasn't run.
- Run the actual test suite and read the output
- Verify the build succeeds
- Check that the feature works as intended
## Supporting skills
## Supporting Skills
These activate automatically during planning and building:
These skills activate automatically during planning and building:
| Skill | When It Helps |
| Skill | When it helps |
|-------|---------------|
| `feature-workflow` | End-to-end feature development |
| `sequential-thinking` | Complex decisions needing step-by-step reasoning |
| `subagent-driven-development` | Fresh subagent per task with two-stage review |
| `using-git-worktrees` | Isolated branch work for parallel development |
| `dispatching-parallel-agents` | Launching independent parallel agents |
| `refactoring` | Improving code structure before shipping |
| `map-codebase` | When you need to understand an unfamiliar area before shaping a spec or plan |
| `audit-dependencies` | Before adding a new third-party package, or after a CVE alert |
## Supporting Agents
## Supporting agents
The skills above dispatch these agents:
| Agent | Role |
|-------|------|
| `planner` | Research and create implementation plans |
| `brainstormer` | Explore solutions and evaluate trade-offs |
| `researcher` | Research technologies and best practices |
| `ceo-reviewer` | Strategic/scope pressure test on a written plan |
| `eng-reviewer` | Architecture review on a written plan |
| `design-reviewer` | UX/visual review on a written plan |
| `devex-reviewer` | Developer-experience review on a written plan |
| `planner` | Decompose specs into executable plans |
| `architect` | Score architecture dimension of a plan |
| `experience-reviewer` | Score UX + DX dimension of a plan |
| `tester` | Design and write tests with red-green discipline |
## Related Pages
## Related pages
- [Testing & Debugging](/workflows/testing-and-debugging/) — TDD and debugging workflows
- [Reviewing & Shipping](/workflows/reviewing-and-shipping/) — Code review and git workflows
- [Skills Reference](/reference/skills/) — All 35 skills
- [Testing & Debugging](/workflows/testing-and-debugging/) — `test-first` and root-cause investigation
- [Reviewing & Shipping](/workflows/reviewing-and-shipping/) — code review and release workflows
- [Skills Reference](/reference/skills/) — All 16 skills
@@ -1,147 +1,120 @@
---
title: Reviewing & Shipping
description: How Claude Kit handles code review, git workflows, PR creation, and branch management.
description: How Claude Kit handles code review, atomic releases, and changelog discipline.
---
# Reviewing & Shipping
Claude Kit provides structured workflows for code review, committing, creating PRs, and finishing development branches.
Two workflows: the code-review loop (between author and reviewer) and the release loop (cutting versioned, changelog-backed releases).
## Code Review
## Code review loop
### Requesting Reviews
**Triggers on**: "code review", "PR review", "request review", "address comments"
**Triggers on**: completing features, before PRs, before merging
`code-review-loop` covers both ends of the loop — preparing a reviewable PR and acting on feedback rigorously. Six steps:
The requesting-code-review skill prepares code for review with:
### Step 1: Prepare the PR
- Clear scope of what changed and why
- Areas of concern flagged for reviewers
- Context on architectural decisions
- Title is one verb-led line ("Add idempotency key to charge endpoint", not "Updates").
- Description has these sections: **What** (1-3 sentences), **Why** (spec link, ticket, bug), **How** (design choice if non-obvious), **Verification** (output from `verification-gate`), **Risk + rollback** (if applicable).
- Diff size: if >400 non-trivial lines (excluding tests, generated files, lockfiles), consider splitting. Reviewers won't read; they'll skim and approve.
### Receiving Reviews
### Step 2: Dispatch reviewer agents
**Triggers on**: review feedback, PR comments, review rejections
Before human reviewers spend their time, dispatch the agents:
The receiving-code-review skill processes feedback systematically:
- `code-reviewer` — structural findings (data flow, error handling, edge cases, complexity, naming)
- `security-auditor` — for sensitive paths only (auth, payments, crypto, sessions, tokens)
1. **Categorize** — Critical vs. important vs. minor
2. **Prioritize** — Fix critical issues first
3. **Implement** — Address feedback with evidence
4. **Re-request** — Summary of changes made
Address obvious findings yourself. Note in the PR description that automated reviewers ran.
### Review Agents
### Step 3: Receive feedback
| Agent | Focus |
|-------|-------|
| `code-reviewer` | Quality, security, performance, maintainability |
| `security-auditor` | OWASP compliance, vulnerability detection |
Every comment gets one of three responses:
## Git Workflows
- **Agree + apply** — make the change, reply with the commit hash
- **Disagree + explain** — cite evidence (a test, a constraint, a spec decision); ask if the reasoning resolves the concern
- **Need more context** — ask for clarification
**Triggers on**: "commit", "push", "PR", "ship", "changelog"
Never silently dismiss a comment. The reviewer will assume you missed it.
The git-workflows skill enforces:
### Step 4: Apply changes in coherent commits
### Conventional Commits
- One commit per topic, even if multiple comments contributed.
- Commit message names what changed and references the comment thread.
- Don't squash before re-review unless project policy demands it.
### Step 5: Re-request review
Add a single summary comment: what was addressed, what was pushed back on. Re-request through the platform's mechanism.
### Step 6: Close the loop
- CI green on the *most recent* commit (not the branch tip from when review was requested).
- All comment threads resolved. Unresolved disagreement = don't merge yet.
- Merge using the project's standard method.
## Release and changelog
**Triggers on**: "release", "version bump", "changelog", "tag", "publish"
`release-and-changelog` enforces SemVer hygiene plus diff-built changelogs plus atomic release commits.
### SemVer discipline
Classify each change since the last release:
- **Breaking** (incompatible API change, removed feature) → MAJOR bump
- **New feature** (additive, backward-compatible) → MINOR bump
- **Bug fix or internal improvement** → PATCH bump
The bump is the **highest** classification across all changes. One breaking change in a release of 50 fixes is still a MAJOR bump.
### Changelog from the diff
Open `CHANGELOG.md`. Add a section: `## [<version>] - <YYYY-MM-DD>`. Subheadings as needed: Added, Changed, Deprecated, Removed, Fixed, Security.
For each change in `git log <last-tag>..HEAD`, write one entry. Each entry:
- Names what changed in user-observable terms (not implementation terms).
- Cites the PR or commit hash.
- Names the consumer impact if non-trivial.
**Reflect the actual diff.** "Improved performance" without naming what is a finding; rewrite from the diff.
### Atomic release commit
One commit. Only the version bump and the changelog. No feature changes, no fixes, no "while I was here" cleanups. The release commit is the bisect target; mixing fixes into it ties the release to those fixes.
### Tag and publish
```
type(scope): subject
feat(auth): add JWT token refresh endpoint
fix(cart): handle empty cart total calculation
docs(api): update OpenAPI spec for v2 endpoints
git tag -a v1.3.0 -m "v1.3.0 (MINOR): added X feature"
git push origin v1.3.0
```
Types: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`
If the project publishes to a registry (npm, PyPI, crates.io, marketplace), run the publish command. Verify the published artifact matches the tag.
### Branch Naming
### Post-release smoke check
```
feature/AUTH-123-jwt-refresh
fix/CART-456-empty-total
hotfix/critical-payment-bug
chore/upgrade-dependencies
```
Install the published artifact in a clean environment (fresh container, separate venv, sandboxed install). Run a smoke check: import the package, run hello-world, hit the new feature. The smoke check catches the published-vs-source gap that CI cannot — missing files in the package manifest, registry transformations, env-var assumptions.
### PR Creation
## Supporting skills
Claude Kit generates well-structured PRs:
```markdown
## Summary
- Added JWT token refresh endpoint
- Tokens auto-refresh 5 minutes before expiry
## Test Plan
- [ ] Unit tests for token refresh logic
- [ ] Integration test for refresh endpoint
- [ ] Manual test: login → wait → verify auto-refresh
```
## Finishing a Branch
**Triggers on**: "ship it", "ready to merge", "branch is done", "create a PR"
The finishing-a-development-branch skill runs a completion checklist:
1. **Verify** — All tests pass, build succeeds
2. **Review** — Run final code review
3. **Options** — Present merge strategies:
- Create PR for team review
- Merge directly (if authorized)
- Clean up worktree (if using git worktrees)
## Git Worktrees
**Triggers on**: "worktree", "isolated branch", "parallel branches"
The using-git-worktrees skill creates isolated working copies for:
- Feature work that shouldn't affect the main workspace
- Parallel development on multiple branches
- Safe experimentation without risk to in-progress work
```
main workspace: d:/project/ (main branch)
feature worktree: d:/project-feature/ (feature/auth branch)
hotfix worktree: d:/project-hotfix/ (hotfix/payment branch)
```
## Changelog Generation
The git-workflows skill generates changelogs from conventional commits:
```markdown
## [1.2.0] - 2026-04-19
### Added
- JWT token refresh endpoint (AUTH-123)
- Auto-refresh 5 minutes before expiry
### Fixed
- Empty cart total calculation (CART-456)
```
## Supporting Skills
| Skill | When It Helps |
| Skill | When it helps |
|-------|---------------|
| `refactoring` | Improving code structure before shipping |
| `writing-concisely` | Token-efficient mode for high-volume review sessions |
| `verification-before-completion` | Mandatory evidence gate before claiming done |
| `verification-gate` | Mandatory evidence gate before claiming the PR is ready |
| `incremental-shipping` | Vertical slices behind feature flags; the "ship it dark first" pattern |
## Supporting Agents
## Supporting agents
| Agent | Role |
|-------|------|
| `git-manager` | Stage, commit, push with conventional commits |
| `code-reviewer` | Comprehensive code review |
| `copywriter` | Release notes, changelogs, PR descriptions |
| `docs-manager` | Keep documentation in sync with code |
| `code-reviewer` | Pre-merge structural review |
| `security-auditor` | OWASP-aligned review on sensitive paths |
## Related Pages
## Related pages
- [Planning & Building](/workflows/planning-and-building/) — Brainstorm, plan, execute
- [Testing & Debugging](/workflows/testing-and-debugging/) — TDD and debugging workflows
- [Skills Reference](/reference/skills/) — All 35 skills
- [Planning & Building](/workflows/planning-and-building/) — Spec, plan, plan-review, implement
- [Testing & Debugging](/workflows/testing-and-debugging/) — Test-first and root-cause investigation
- [Skills Reference](/reference/skills/) — All 16 skills
@@ -1,145 +1,110 @@
---
title: Testing & Debugging
description: How Claude Kit enforces test-driven development, systematic debugging, and verification.
description: How Claude Kit enforces test-first discipline, root-cause investigation, and pre-completion verification.
---
# Testing & Debugging
Claude Kit enforces quality through three connected workflows: **TDD for building**, **systematic debugging for fixing**, and **verification before completion**.
Three connected workflows: **test-first for building**, **investigate-root-cause for fixing**, and **verification-gate before completion**.
## Test-Driven Development
## Test-first
**Triggers on**: "implement", "add feature", "fix bug", "write code", "build"
**Triggers on**: "implement", "add feature", "fix bug", "TDD", "write the test first"
The TDD skill enforces a strict red-green-refactor cycle for all production code changes:
`test-first` enforces strict red-green-refactor for all production code changes:
```
1. Write a failing test → Run it → Confirm it fails (RED)
2. Write minimal code → Run it → Confirm it passes (GREEN)
3. Refactor if needed → Run it → Confirm it still passes
4. Commit
1. Pick the smallest testable behavior
2. Write a failing test → Run it → Confirm it fails (RED) → Paste output
3. Make it pass with the smallest change → Confirm it passes (GREEN) → Paste output
4. Refactor → Confirm tests still pass → Paste output
5. Loop with the next case
```
### Why TDD by Default?
The runner output is the evidence. If you can't paste red and green, you haven't run the cycle.
- Tests document intent, not just behavior
- Catches regressions immediately
- Forces small, focused changes
- Creates natural commit points
### Stack-specific commands
### Stack-Specific Commands
| Stack | Test command | Notes |
|-------|-------------|-------|
| Python (pytest) | `pytest <path> -k <name>` | Use `-x` to stop on first failure during red. |
| Node (vitest) | `vitest run <file>` | Pass `--reporter=verbose` for clear output. |
| Node (jest) | `jest <file> -t <name>` | |
| Rust (cargo) | `cargo test <name>` | `--nocapture` to see prints during dev. |
| Go | `go test ./<pkg> -run <name>` | `-v` for verbose. |
| Playwright (E2E) | `npx playwright test <file>` | Reserve for end-to-end golden paths. |
| Stack | Test Command | Full Verify |
|-------|-------------|-------------|
| Python/FastAPI | `pytest tests/test_<module>.py -v` | `pytest -v && ruff check .` |
| TypeScript/NestJS | `npm test -- --testPathPattern=<module>` | `npm test && npm run lint && npm run build` |
| Next.js/React | `npx vitest run <file>` | `npm test && next lint && next build` |
## Systematic Debugging
## Investigate root cause
**Triggers on**: "bug", "error", "failing", "broken", "doesn't work", "TypeError", stack traces
The systematic-debugging skill follows a four-phase investigation:
`investigate-root-cause` follows four phases. No fixes without a written hypothesis first.
### Phase 1: Observe
### Phase 1: Gather
Gather evidence before forming hypotheses:
- Read the error message and stack trace
- Reproduce the issue
- Check logs and recent changes
Surface every fact that already exists. Capture the literal error text + stack trace (don't paraphrase). Find the reproduction. Read recent commits touching files in the trace. Pull logs around the failure window. Look at the actual data.
### Phase 2: Hypothesize
Form specific, testable theories:
- "The null check on line 42 doesn't handle the empty array case"
- Not: "Something is wrong with the data"
Convert evidence into one written sentence:
> The bug occurs because [X] causes [Y] when [Z].
No "I think." No "maybe." If you can't fill all three slots, return to Phase 1.
### Phase 3: Test
Verify each hypothesis systematically:
- Add logging or breakpoints
- Write a test that reproduces the bug
- Isolate the failing component
Design the smallest test of the hypothesis (instrumentation OR experiment). Run. Capture output. Verdict: **Confirmed** → advance to Phase 4. **Refuted** → return to Phase 2 with new evidence. **Ambiguous** → add probes.
### Phase 4: Fix
For active runtime instrumentation in this phase, `evidence-driven-debugging` is the companion skill — adds tagged probes, captures output, cleans up after.
Apply the minimal fix:
- Fix the root cause, not the symptom
- Add a regression test
- Verify the original error is gone
### Phase 4: Prove
### Root Cause Tracing
A failing test (red) that captures the bug. The smallest fix that makes it pass (green). Full suite green. Original Phase 1 reproducer post-fix. Paste all four runner outputs.
**Triggers on**: deep bugs where the error location differs from the bug origin
### The three-fix rule
For bugs that manifest far from their source, the root-cause-tracing skill traces the data flow backward to find where things first went wrong:
If three or more fix attempts have failed consecutively, the bug is architectural, not local. Stop. Escalate or rescope.
```
Error: NullPointerException at OrderService.getTotal()
↓ trace backward
OrderService.getTotal() receives null item
↓ trace backward
CartService.getItems() returns null for empty cart
↓ root cause
CartRepository.findByUserId() returns null instead of []
```
## Verification gate
## Verification Before Completion
**Auto-triggers on**: completion claims ("done", "fixed", "tests pass", "ready to merge")
**Auto-triggers on**: "done", "fixed", "tests pass", "build succeeds"
`verification-gate` is the load-bearing pre-completion check. Six steps:
The verification skill prevents false completion claims. Before saying "done", Claude must:
1. **Restate the claim**`I am claiming <X> is complete because <Y>` (Y must be evidence).
2. **Run the named tests** with full output. Paste it.
3. **Run the negative path** — invalid input, missing field, network failure, max-size input. Capture what happens.
4. **Verify in a non-IDE environment**`curl` from a separate shell, fresh container, browser open. The IDE has env vars and hot-reload that production doesn't.
5. **Cross-check the original ask** — re-read the ticket, matrix what was asked to where it was addressed.
6. **Sign the gate** — add a `## Verification` section to the PR with all of the above.
1. **Run the test suite** and read the output
2. **Run the build** and confirm it succeeds
3. **Check for regressions** in related functionality
4. **Show evidence** — actual command output, not assumptions
If the runner output isn't pasted, the gate hasn't run.
### What Gets Caught
## What gets caught
```
Without verification:
"I've fixed the bug" → Actually introduced a new failing test
"I've fixed the bug" → Actually introduced a new failing test elsewhere
"Tests pass" → Only ran the file the change was in; suite has 3 failures
"Works on my machine" → Production env var not set; nothing works in prod
With verification:
Run pytest → See 2 failures → Fix both → Run again → All green → "Fixed"
Run named testsgreen; run full suite → green;
curl from fresh shell → expected response;
cross-check ticket → all asks addressed → sign the gate
```
## Testing Anti-Patterns
**Triggers on**: "mock", "flaky test", "test passes but bug ships", "false positive"
The testing-anti-patterns skill catches common mistakes:
| Anti-Pattern | Problem | Fix |
|-------------|---------|-----|
| Heavy mocking | Tests pass but production breaks | Test real integrations |
| Testing implementation | Tests break on refactor | Test behavior, not internals |
| No edge cases | Happy path works, edge cases crash | Test boundaries and errors |
| Flaky tests | Random failures erode trust | Fix or delete, never ignore |
## Defense in Depth
**Triggers on**: data validation bugs, "it slipped through", bypass scenarios
The defense-in-depth skill adds validation at multiple layers so a single-point failure can't cause data corruption:
```
API layer: Validate input shape (Pydantic/Zod)
Service layer: Validate business rules
Database layer: Constraints (NOT NULL, UNIQUE, CHECK)
```
## Supporting Agents
## Supporting agents
| Agent | Role |
|-------|------|
| `tester` | Run test suites, analyze coverage, validate error handling |
| `debugger` | Investigate bugs, check logs, reproduce issues |
| `security-auditor` | Security-focused code review |
| `tester` | Design test cases; write tests with red-green discipline; paste runner output |
| `investigator` | Root-cause investigation with evidence chain |
| `security-auditor` | OWASP-aligned review on sensitive paths (when bugs touch auth/payments/crypto) |
## Related Pages
## Related pages
- [Planning & Building](/workflows/planning-and-building/) — Brainstorm, plan, execute
- [Reviewing & Shipping](/workflows/reviewing-and-shipping/) — Code review and git workflows
- [Skills Reference](/reference/skills/) — All 35 skills
- [Planning & Building](/workflows/planning-and-building/) — Spec, plan, plan-review, implement
- [Reviewing & Shipping](/workflows/reviewing-and-shipping/) — Code review and release workflows
- [Skills Reference](/reference/skills/) — All 16 skills