14 KiB
name, description
| name | description |
|---|---|
| bulletproof | Use when building a feature, refactoring, fixing a complex bug, changing architecture, or starting any non-trivial coding task. 12-stage verified dev workflow from research to deploy. Adapted for Claude.ai (no sub-agents), Python/Docker/Traefik/MikroTik/embedded stacks, Gitea CI/CD, and SonarQube. |
Bulletproof — Adaptive Development Workflow
Based on: Artemiy Miller's Bulletproof v5.0 Adapted for: Claude.ai · Arthur Abelentsev's infrastructure stack Version: 5.1-aa · March 2026
Core Principle
Code to solve problems, not code for code's sake.
Before EVERY change ask: "Does this actually solve our problem? Is this the most efficient solution?" If the answer isn't clear — stop, research alternatives, pick the best one.
Pick Your Mode
Not every task needs the full pipeline.
| Size | Examples | Mode | Stages |
|---|---|---|---|
| S | Bug fix, config tweak, 1-2 files | Lightweight | 1 → 4 → 5 → 6 → 7 → Gates |
| M | New feature, module refactor, 3-10 files | Standard | Stages 1-10 |
| L | Architecture change, new service/container, 10+ files | Full | Stages 1-12 (all) |
How stages relate: Stages 5-6-7 (Self-Audit, Verification, Impact) run inside each implementation phase as an inner loop. Stages 8-12 run once after all phases complete as an outer loop.
Stack-Specific Conventions (ALWAYS applies)
These conventions apply automatically to all code produced under this workflow:
Python
- Use
~=(compatible release) inrequirements.txt, never==or>= - Formatting:
ruff format+ruff check - Type hints on all public functions
pathlib.Pathoveros.path
Docker / Compose
- Filename: always
compose.yaml(neverdocker-compose.yml) - Pin image tags to specific versions, never use
:latestin production - Health checks for every service
- Named volumes over bind mounts for persistent data
Gitea
- Clone via
ssh://gitea-lan/<org>/<repo>.git - One commit per logical change — if changes span multiple files, stage and commit together
- Gitea Actions CI for build/test/deploy pipelines
- Deploy pattern:
.deploy.env+ Gitea Secrets
Infrastructure (Traefik, MikroTik, WireGuard)
- Traefik v3 with Docker provider; labels on compose services
- Let's Encrypt via DNS challenge for wildcard certs
- MikroTik config changes: always test with
/system schedulerrollback timer before commit - WireGuard peer configs: document AllowedIPs and routing table in comments
Embedded Firmware
- For any embedded/MCU/firmware task: read the
embedded-firmware-engineerskill first. It contains NASA/JPL Power of Ten rules, banned functions, DMA/cache coherence, GPIO policy, watchdog strategy, brown-out testing, and code review checklists specific to bare-metal and RTOS development. - PlatformIO as build system;
platformio.inimust pin platform and framework versions - Build flags:
-Wall -Werror -Wextra -Wpedantic
Stage 1: Deep Research
Mode: Read-Only. No code. No changes.
- Investigate the problem area: structure, patterns, dependencies, existing tests
- WebSearch: Who has already solved this problem? How did they solve it? What is the most efficient known solution? Don't reinvent — find the best existing approach first.
- Analyze all findings and make a conclusion: which solution is the BEST and why. The research artifact must end with a clear recommendation, not just a list of options.
- Save to
thoughts/research/YYYY-MM-DD-<task>.md(seetemplates/research.mdfor format)
Stage 2: Spec / PRD
Mode: Write specs only. No code.
Spec = WHAT and WHY. Not how. Spec = contract.
- Read Research Artifact from
thoughts/research/ - Create
specs/YYYY-MM-DD-<n>.md(seetemplates/spec.mdfor format) - Key sections: Problem, Goal, Scope, Acceptance Criteria, Constraints, Non-Goals
Skip for size S tasks.
Stage 3: Planning + Questions
Mode: Write plans only. No code yet.
- Read both Spec (
specs/) and Research (thoughts/research/) - Find gaps: what's unthought? What edge cases? What could break?
- Be creative and proactive: anticipate ALL possible problems BEFORE writing code. Think several steps ahead. What could go wrong in a week? A month? Under load? With unexpected user behavior? Solve problems before they exist.
- WebSearch: How have others solved this exact problem? What libraries/patterns exist? What's the proven best practice? Choose the most efficient solution, not the first one that comes to mind.
- After verifying the approach — rewrite the plan into an improved version incorporating all findings, edge cases, and research results. Not just patch it — rewrite it better.
Challenge Loop (mandatory before finalizing plan)
Before finalizing the plan, answer 3 questions:
1. DOES THIS SOLVE THE PROBLEM?
Compare every plan item against acceptance criteria from spec.
If any criterion is uncovered — the plan is incomplete.
2. IS THIS THE MOST EFFICIENT SOLUTION?
Search: who has already solved this problem? What approach did they use?
Name 2-3 alternative approaches (including ones found via research).
For each: pros, cons, effort.
Justify why the chosen approach is better than all alternatives.
3. IS THERE "CODE FOR CODE'S SAKE"?
Every change must directly serve acceptance criteria.
If a change isn't tied to solving the problem — remove it.
Drive-by refactoring = separate task, not part of this one.
Review Cycle
- Claude drafts the plan
- User reviews in chat, adds notes/corrections
- Claude addresses all notes, rewrites affected sections
- Repeat until user approves
Questions for User
- Only for real forks where there's a genuine decision to make
- For each question: recommend which option you think is best and why
- Don't ask the obvious
Final Plan
Create plans/YYYY-MM-DD-<n>.md
(see templates/plan.md for full template with Challenge Log, phases, prompts)
Stage 4: Phased Implementation
Each phase = separate logical unit, feature branch.
Order within each phase:
- Create/switch to feature branch:
feature/<task> - Update status →
in_progress - TDD: tests FIRST (red)
- Implement: code to make tests pass (green)
- Refactor (if needed)
- Self-Audit (Stage 5)
- Verification (Stage 6)
- Impact Analysis (Stage 7)
- Gates (see Gates section)
- Commit — one commit per logical change, descriptive message
- Status →
completed, write to Changelog - Handoff (write
progress/<task>-handoff.md, seetemplates/handoff.md)
Stage 5: Self-Audit (after each phase)
Mandatory BEFORE marking completed:
Check the phase implementation:
1. SPEC COMPLIANCE
Open spec. Walk through every acceptance criterion.
For each: implemented? Where exactly in code?
If any not covered — finish it.
2. CHALLENGE THE SOLUTION
Look at the written code with fresh eyes.
Does this actually solve the problem from spec?
Is there a simpler/more efficient way?
Any "code for code's sake" — changes unrelated to the task?
Stage 6: Verification — Deep Bug Hunt
Not just linting. Thoughtful review with false-positive filtering.
Step 1: Find errors
Check ALL code from this phase for:
- Logic errors (wrong conditions, off-by-one, race conditions)
- Data handling (null/undefined, type mismatches)
- Security (injection, auth bypass, exposed secrets)
- Performance (N+1 queries, memory leaks, unnecessary allocations)
- Docker: health check failures, volume mount conflicts, port collisions
- Infrastructure: Traefik label typos, routing priority conflicts
Step 2: Verify bugs are REAL
For EACH found bug:
1. Is this a REAL bug or a false positive?
2. Can you prove this bug is reproducible?
3. If you can't prove it — it's NOT a bug. Don't touch it.
RULE: Don't fix code "for beauty" or "just in case".
Fix ONLY proven bugs that actually affect functionality.
Every "fix" without proof = risk of introducing a new bug.
Step 3: Logic and efficiency check
Final code cleanliness check:
- Logic: is the data flow correct from input to output?
- Efficiency: any redundant operations?
- Readability: is the code understandable without comments?
BUT: don't refactor "for beauty". Only if it affects correctness.
Stage 7: Impact Analysis — "Did we break anything?"
The most underestimated stage. 75% of AI agents break previously working code.
MANDATORY CHECK BEFORE MERGE:
1. REGRESSION
What other modules/functions depend on changed files?
Run ALL project tests (not just current phase).
If anything broke — this is priority #1.
2. SIDE EFFECTS
Did any contracts/interfaces change (API, props, types)?
If yes — who uses them? Are all consumers updated?
Docker: did any service ports, volumes, or network names change?
Traefik: do routing rules still resolve correctly?
3. THINK AHEAD
What problems could these changes cause in a week/month?
Edge cases we haven't tested?
What happens with: zero data? Huge data? Concurrent requests?
What if the user does something unexpected?
4. COMPATIBILITY
Backward compatibility preserved?
Data migrations needed?
Docker volume data backward-compatible with new container version?
Feature flags needed for gradual rollout?
Stage 8: Integration Check
- All phases
completed→ run gates across entire project - Audit: everything from spec implemented?
- Every acceptance criterion → fulfilled?
Stage 9: Code Review (fresh perspective)
Review as if seeing this code for the first time.
See agents/code-reviewer.md for the full review checklist.
Key areas:
- Edge cases, race conditions, backward compat, security, error handling, performance
- Docker/Compose: service dependencies, restart policies, resource limits
- Infrastructure: Traefik routing, TLS configuration, firewall rules
Warning: AI reviewing its own code has blind spots. For critical infrastructure changes — flag for human review.
Stage 10: Security Scan (for M and L)
# SonarQube analysis (preferred — already in the stack)
# Push to Gitea → Gitea Actions triggers SonarQube scan
# Alternative: local semgrep
semgrep --config=auto .
For Docker/Compose changes, additionally check:
- No secrets in compose.yaml or Dockerfiles (use .env or Gitea Secrets)
- Images from trusted registries only
- No privileged containers without justification
- Network segmentation: services not exposed beyond what's needed
Stage 11: Fixes + Re-verification
If review/scan found issues:
- Fix (only proven bugs — rule from Stage 6)
- Re-run gates
- Repeat Impact Analysis (Stage 7) — fixes didn't break anything else?
- Re-review if major changes were made
Stage 12: Cleanup + Deploy
- Archive plan:
mv plans/<file> plans/archive/ - Keep spec as documentation
- Squash merge → main (via Gitea PR)
- Deploy — ONLY on explicit user request
Deterministic Gates
A phase CANNOT be completed without passing ALL required gates.
Tier 1: Required (block the phase)
Python projects:
ruff check . # 0 lint errors
ruff format --check . # formatting verified
pytest --tb=short -q # all tests green
python -m py_compile <main_module>.py # syntax OK
Docker/Compose projects:
docker compose -f compose.yaml config # compose file valid
docker compose build # all images build
docker compose up -d && sleep 10 && \
docker compose ps --format json | \
python3 -c "import sys,json; \
svcs=json.loads(sys.stdin.read()); \
exit(0 if all(s['Health']=='healthy' or s['State']=='running' for s in svcs) else 1)"
# all services healthy
Embedded (PlatformIO):
pio check # static analysis
pio run # firmware builds
pio test # unit tests pass (native)
Tier 2: Recommended (for M and L)
# Python
pip-audit # dependency vulnerabilities
mypy --strict . # type checking (if project uses mypy)
# Docker
docker scout cves <image> # image CVE scan (if available)
# General
semgrep --config=auto . # security patterns
Tier 3: Deep Security (SonarQube)
# Via Gitea Actions pipeline — push triggers analysis
# Or manually:
sonar-scanner -Dsonar.projectKey=<key> -Dsonar.host.url=<url>
If a gate fails — fix and re-run. Never skip.
Git Discipline
- Each task =
feature/<task>branch - One commit per logical change — group related file changes into a single commit
- Commit after each passed gate (checkpoint for rollback)
- NEVER push to main directly
- Squash merge on completion (via Gitea PR)
- Clone format:
ssh://gitea-lan/<org>/<repo>.git
Model Recommendations
| Stage | Model | Why |
|---|---|---|
| Research, Planning | Opus | Cross-file reasoning, deep analysis |
| Implementation | Sonnet | Speed, cost-efficiency |
| Code Review, Security | Opus | Deep analysis, fresh perspective |
Project Structure
project/
├── specs/ # WHAT and WHY
├── plans/ # HOW
│ └── archive/ # completed plans
├── thoughts/research/ # research artifacts
├── progress/ # handoff files
├── compose.yaml # Docker Compose (if applicable)
├── platformio.ini # PlatformIO config (if embedded)
├── requirements.txt # Python deps with ~= specifiers
├── sonar-project.properties # SonarQube config (if applicable)
└── .gitea/
└── workflows/ # Gitea Actions CI/CD