411 lines
14 KiB
Markdown
411 lines
14 KiB
Markdown
---
|
|
name: bulletproof
|
|
description: Use when building a feature, refactoring, fixing a complex bug, changing architecture, or starting any non-trivial coding task. 12-stage verified dev workflow from research to deploy. Adapted for Claude.ai (no sub-agents), Python/Docker/Traefik/MikroTik/embedded stacks, Gitea CI/CD, and SonarQube.
|
|
---
|
|
|
|
# Bulletproof — Adaptive Development Workflow
|
|
|
|
> **Based on:** Artemiy Miller's Bulletproof v5.0
|
|
> **Adapted for:** Claude.ai · Arthur Abelentsev's infrastructure stack
|
|
> **Version:** 5.1-aa · March 2026
|
|
|
|
## Core Principle
|
|
|
|
**Code to solve problems, not code for code's sake.**
|
|
|
|
Before EVERY change ask: "Does this actually solve our problem? Is this the most efficient solution?"
|
|
If the answer isn't clear — stop, research alternatives, pick the best one.
|
|
|
|
---
|
|
|
|
## Pick Your Mode
|
|
|
|
Not every task needs the full pipeline.
|
|
|
|
| Size | Examples | Mode | Stages |
|
|
|------|----------|------|--------|
|
|
| **S** | Bug fix, config tweak, 1-2 files | Lightweight | 1 → 4 → 5 → 6 → 7 → Gates |
|
|
| **M** | New feature, module refactor, 3-10 files | Standard | Stages 1-10 |
|
|
| **L** | Architecture change, new service/container, 10+ files | Full | Stages 1-12 (all) |
|
|
|
|
**How stages relate:** Stages 5-6-7 (Self-Audit, Verification, Impact) run **inside each implementation phase** as an inner loop. Stages 8-12 run **once after all phases complete** as an outer loop.
|
|
|
|
---
|
|
|
|
## Stack-Specific Conventions (ALWAYS applies)
|
|
|
|
These conventions apply automatically to all code produced under this workflow:
|
|
|
|
### Python
|
|
- Use `~=` (compatible release) in `requirements.txt`, never `==` or `>=`
|
|
- Formatting: `ruff format` + `ruff check`
|
|
- Type hints on all public functions
|
|
- `pathlib.Path` over `os.path`
|
|
|
|
### Docker / Compose
|
|
- Filename: always `compose.yaml` (never `docker-compose.yml`)
|
|
- Pin image tags to specific versions, never use `:latest` in production
|
|
- Health checks for every service
|
|
- Named volumes over bind mounts for persistent data
|
|
|
|
### Gitea
|
|
- Clone via `ssh://gitea-lan/<org>/<repo>.git`
|
|
- **One commit per logical change** — if changes span multiple files, stage and commit together
|
|
- Gitea Actions CI for build/test/deploy pipelines
|
|
- Deploy pattern: `.deploy.env` + Gitea Secrets
|
|
|
|
### Infrastructure (Traefik, MikroTik, WireGuard)
|
|
- Traefik v3 with Docker provider; labels on compose services
|
|
- Let's Encrypt via DNS challenge for wildcard certs
|
|
- MikroTik config changes: always test with `/system scheduler` rollback timer before commit
|
|
- WireGuard peer configs: document AllowedIPs and routing table in comments
|
|
|
|
### Embedded Firmware
|
|
- **For any embedded/MCU/firmware task: read the `embedded-firmware-engineer` skill first.** It contains NASA/JPL Power of Ten rules, banned functions, DMA/cache coherence, GPIO policy, watchdog strategy, brown-out testing, and code review checklists specific to bare-metal and RTOS development.
|
|
- PlatformIO as build system; `platformio.ini` must pin platform and framework versions
|
|
- Build flags: `-Wall -Werror -Wextra -Wpedantic`
|
|
|
|
---
|
|
|
|
## Stage 1: Deep Research
|
|
|
|
**Mode: Read-Only. No code. No changes.**
|
|
|
|
- Investigate the problem area: structure, patterns, dependencies, existing tests
|
|
- **WebSearch: Who has already solved this problem? How did they solve it? What is the most efficient known solution?** Don't reinvent — find the best existing approach first.
|
|
- **Analyze all findings and make a conclusion: which solution is the BEST and why.** The research artifact must end with a clear recommendation, not just a list of options.
|
|
- Save to `thoughts/research/YYYY-MM-DD-<task>.md`
|
|
(see `templates/research.md` for format)
|
|
|
|
---
|
|
|
|
## Stage 2: Spec / PRD
|
|
|
|
**Mode: Write specs only. No code.**
|
|
|
|
**Spec = WHAT and WHY. Not how. Spec = contract.**
|
|
|
|
- Read Research Artifact from `thoughts/research/`
|
|
- Create `specs/YYYY-MM-DD-<n>.md`
|
|
(see `templates/spec.md` for format)
|
|
- Key sections: Problem, Goal, Scope, Acceptance Criteria, Constraints, Non-Goals
|
|
|
|
**Skip for size S tasks.**
|
|
|
|
---
|
|
|
|
## Stage 3: Planning + Questions
|
|
|
|
**Mode: Write plans only. No code yet.**
|
|
|
|
- Read **both** Spec (`specs/`) and Research (`thoughts/research/`)
|
|
- Find gaps: what's unthought? What edge cases? What could break?
|
|
- **Be creative and proactive: anticipate ALL possible problems BEFORE writing code.** Think several steps ahead. What could go wrong in a week? A month? Under load? With unexpected user behavior? Solve problems before they exist.
|
|
- **WebSearch: How have others solved this exact problem? What libraries/patterns exist? What's the proven best practice?** Choose the most efficient solution, not the first one that comes to mind.
|
|
- After verifying the approach — **rewrite the plan into an improved version** incorporating all findings, edge cases, and research results. Not just patch it — rewrite it better.
|
|
|
|
### Challenge Loop (mandatory before finalizing plan)
|
|
|
|
```
|
|
Before finalizing the plan, answer 3 questions:
|
|
|
|
1. DOES THIS SOLVE THE PROBLEM?
|
|
Compare every plan item against acceptance criteria from spec.
|
|
If any criterion is uncovered — the plan is incomplete.
|
|
|
|
2. IS THIS THE MOST EFFICIENT SOLUTION?
|
|
Search: who has already solved this problem? What approach did they use?
|
|
Name 2-3 alternative approaches (including ones found via research).
|
|
For each: pros, cons, effort.
|
|
Justify why the chosen approach is better than all alternatives.
|
|
|
|
3. IS THERE "CODE FOR CODE'S SAKE"?
|
|
Every change must directly serve acceptance criteria.
|
|
If a change isn't tied to solving the problem — remove it.
|
|
Drive-by refactoring = separate task, not part of this one.
|
|
```
|
|
|
|
### Review Cycle
|
|
1. Claude drafts the plan
|
|
2. User reviews in chat, adds notes/corrections
|
|
3. Claude addresses all notes, rewrites affected sections
|
|
4. Repeat until user approves
|
|
|
|
### Questions for User
|
|
- Only for real forks where there's a genuine decision to make
|
|
- For each question: **recommend which option you think is best and why**
|
|
- Don't ask the obvious
|
|
|
|
### Final Plan
|
|
Create `plans/YYYY-MM-DD-<n>.md`
|
|
(see `templates/plan.md` for full template with Challenge Log, phases, prompts)
|
|
|
|
---
|
|
|
|
## Stage 4: Phased Implementation
|
|
|
|
**Each phase = separate logical unit, feature branch.**
|
|
|
|
Order within each phase:
|
|
1. Create/switch to feature branch: `feature/<task>`
|
|
2. Update status → `in_progress`
|
|
3. **TDD**: tests FIRST (red)
|
|
4. **Implement**: code to make tests pass (green)
|
|
5. **Refactor** (if needed)
|
|
6. **Self-Audit** (Stage 5)
|
|
7. **Verification** (Stage 6)
|
|
8. **Impact Analysis** (Stage 7)
|
|
9. **Gates** (see Gates section)
|
|
10. **Commit** — one commit per logical change, descriptive message
|
|
11. Status → `completed`, write to Changelog
|
|
12. **Handoff** (write `progress/<task>-handoff.md`, see `templates/handoff.md`)
|
|
|
|
---
|
|
|
|
## Stage 5: Self-Audit (after each phase)
|
|
|
|
**Mandatory BEFORE marking `completed`:**
|
|
|
|
```
|
|
Check the phase implementation:
|
|
|
|
1. SPEC COMPLIANCE
|
|
Open spec. Walk through every acceptance criterion.
|
|
For each: implemented? Where exactly in code?
|
|
If any not covered — finish it.
|
|
|
|
2. CHALLENGE THE SOLUTION
|
|
Look at the written code with fresh eyes.
|
|
Does this actually solve the problem from spec?
|
|
Is there a simpler/more efficient way?
|
|
Any "code for code's sake" — changes unrelated to the task?
|
|
```
|
|
|
|
---
|
|
|
|
## Stage 6: Verification — Deep Bug Hunt
|
|
|
|
**Not just linting. Thoughtful review with false-positive filtering.**
|
|
|
|
### Step 1: Find errors
|
|
```
|
|
Check ALL code from this phase for:
|
|
- Logic errors (wrong conditions, off-by-one, race conditions)
|
|
- Data handling (null/undefined, type mismatches)
|
|
- Security (injection, auth bypass, exposed secrets)
|
|
- Performance (N+1 queries, memory leaks, unnecessary allocations)
|
|
- Docker: health check failures, volume mount conflicts, port collisions
|
|
- Infrastructure: Traefik label typos, routing priority conflicts
|
|
```
|
|
|
|
### Step 2: Verify bugs are REAL
|
|
```
|
|
For EACH found bug:
|
|
1. Is this a REAL bug or a false positive?
|
|
2. Can you prove this bug is reproducible?
|
|
3. If you can't prove it — it's NOT a bug. Don't touch it.
|
|
|
|
RULE: Don't fix code "for beauty" or "just in case".
|
|
Fix ONLY proven bugs that actually affect functionality.
|
|
Every "fix" without proof = risk of introducing a new bug.
|
|
```
|
|
|
|
### Step 3: Logic and efficiency check
|
|
```
|
|
Final code cleanliness check:
|
|
- Logic: is the data flow correct from input to output?
|
|
- Efficiency: any redundant operations?
|
|
- Readability: is the code understandable without comments?
|
|
BUT: don't refactor "for beauty". Only if it affects correctness.
|
|
```
|
|
|
|
---
|
|
|
|
## Stage 7: Impact Analysis — "Did we break anything?"
|
|
|
|
**The most underestimated stage. 75% of AI agents break previously working code.**
|
|
|
|
```
|
|
MANDATORY CHECK BEFORE MERGE:
|
|
|
|
1. REGRESSION
|
|
What other modules/functions depend on changed files?
|
|
Run ALL project tests (not just current phase).
|
|
If anything broke — this is priority #1.
|
|
|
|
2. SIDE EFFECTS
|
|
Did any contracts/interfaces change (API, props, types)?
|
|
If yes — who uses them? Are all consumers updated?
|
|
Docker: did any service ports, volumes, or network names change?
|
|
Traefik: do routing rules still resolve correctly?
|
|
|
|
3. THINK AHEAD
|
|
What problems could these changes cause in a week/month?
|
|
Edge cases we haven't tested?
|
|
What happens with: zero data? Huge data? Concurrent requests?
|
|
What if the user does something unexpected?
|
|
|
|
4. COMPATIBILITY
|
|
Backward compatibility preserved?
|
|
Data migrations needed?
|
|
Docker volume data backward-compatible with new container version?
|
|
Feature flags needed for gradual rollout?
|
|
```
|
|
|
|
---
|
|
|
|
## Stage 8: Integration Check
|
|
|
|
- All phases `completed` → run gates across entire project
|
|
- Audit: everything from spec implemented?
|
|
- Every acceptance criterion → fulfilled?
|
|
|
|
---
|
|
|
|
## Stage 9: Code Review (fresh perspective)
|
|
|
|
**Review as if seeing this code for the first time.**
|
|
|
|
See `agents/code-reviewer.md` for the full review checklist.
|
|
|
|
Key areas:
|
|
- Edge cases, race conditions, backward compat, security, error handling, performance
|
|
- Docker/Compose: service dependencies, restart policies, resource limits
|
|
- Infrastructure: Traefik routing, TLS configuration, firewall rules
|
|
|
|
**Warning**: AI reviewing its own code has blind spots. For critical infrastructure changes — flag for human review.
|
|
|
|
---
|
|
|
|
## Stage 10: Security Scan (for M and L)
|
|
|
|
```bash
|
|
# SonarQube analysis (preferred — already in the stack)
|
|
# Push to Gitea → Gitea Actions triggers SonarQube scan
|
|
|
|
# Alternative: local semgrep
|
|
semgrep --config=auto .
|
|
```
|
|
|
|
For Docker/Compose changes, additionally check:
|
|
- No secrets in compose.yaml or Dockerfiles (use .env or Gitea Secrets)
|
|
- Images from trusted registries only
|
|
- No privileged containers without justification
|
|
- Network segmentation: services not exposed beyond what's needed
|
|
|
|
---
|
|
|
|
## Stage 11: Fixes + Re-verification
|
|
|
|
If review/scan found issues:
|
|
1. Fix (only proven bugs — rule from Stage 6)
|
|
2. Re-run gates
|
|
3. Repeat Impact Analysis (Stage 7) — fixes didn't break anything else?
|
|
4. Re-review if major changes were made
|
|
|
|
---
|
|
|
|
## Stage 12: Cleanup + Deploy
|
|
|
|
- Archive plan: `mv plans/<file> plans/archive/`
|
|
- Keep spec as documentation
|
|
- Squash merge → main (via Gitea PR)
|
|
- **Deploy — ONLY on explicit user request**
|
|
|
|
---
|
|
|
|
## Deterministic Gates
|
|
|
|
A phase CANNOT be `completed` without passing ALL required gates.
|
|
|
|
### Tier 1: Required (block the phase)
|
|
|
|
**Python projects:**
|
|
```bash
|
|
ruff check . # 0 lint errors
|
|
ruff format --check . # formatting verified
|
|
pytest --tb=short -q # all tests green
|
|
python -m py_compile <main_module>.py # syntax OK
|
|
```
|
|
|
|
**Docker/Compose projects:**
|
|
```bash
|
|
docker compose -f compose.yaml config # compose file valid
|
|
docker compose build # all images build
|
|
docker compose up -d && sleep 10 && \
|
|
docker compose ps --format json | \
|
|
python3 -c "import sys,json; \
|
|
svcs=json.loads(sys.stdin.read()); \
|
|
exit(0 if all(s['Health']=='healthy' or s['State']=='running' for s in svcs) else 1)"
|
|
# all services healthy
|
|
```
|
|
|
|
**Embedded (PlatformIO):**
|
|
```bash
|
|
pio check # static analysis
|
|
pio run # firmware builds
|
|
pio test # unit tests pass (native)
|
|
```
|
|
|
|
### Tier 2: Recommended (for M and L)
|
|
```bash
|
|
# Python
|
|
pip-audit # dependency vulnerabilities
|
|
mypy --strict . # type checking (if project uses mypy)
|
|
|
|
# Docker
|
|
docker scout cves <image> # image CVE scan (if available)
|
|
|
|
# General
|
|
semgrep --config=auto . # security patterns
|
|
```
|
|
|
|
### Tier 3: Deep Security (SonarQube)
|
|
```bash
|
|
# Via Gitea Actions pipeline — push triggers analysis
|
|
# Or manually:
|
|
sonar-scanner -Dsonar.projectKey=<key> -Dsonar.host.url=<url>
|
|
```
|
|
|
|
If a gate fails — fix and re-run. Never skip.
|
|
|
|
---
|
|
|
|
## Git Discipline
|
|
|
|
- Each task = `feature/<task>` branch
|
|
- **One commit per logical change** — group related file changes into a single commit
|
|
- Commit after each passed gate (checkpoint for rollback)
|
|
- NEVER push to main directly
|
|
- Squash merge on completion (via Gitea PR)
|
|
- Clone format: `ssh://gitea-lan/<org>/<repo>.git`
|
|
|
|
---
|
|
|
|
## Model Recommendations
|
|
|
|
| Stage | Model | Why |
|
|
|-------|-------|-----|
|
|
| Research, Planning | Opus | Cross-file reasoning, deep analysis |
|
|
| Implementation | Sonnet | Speed, cost-efficiency |
|
|
| Code Review, Security | Opus | Deep analysis, fresh perspective |
|
|
|
|
---
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
project/
|
|
├── specs/ # WHAT and WHY
|
|
├── plans/ # HOW
|
|
│ └── archive/ # completed plans
|
|
├── thoughts/research/ # research artifacts
|
|
├── progress/ # handoff files
|
|
├── compose.yaml # Docker Compose (if applicable)
|
|
├── platformio.ini # PlatformIO config (if embedded)
|
|
├── requirements.txt # Python deps with ~= specifiers
|
|
├── sonar-project.properties # SonarQube config (if applicable)
|
|
└── .gitea/
|
|
└── workflows/ # Gitea Actions CI/CD
|
|
```
|