Files
claudekit/.claude/skills/methodology/systematic-debugging/references/debugging-checklist.md
T

5.9 KiB

Systematic Debugging Checklist

Step-by-step process for debugging any issue. Follow the steps in order — skipping ahead is the most common cause of wasted debugging time.


Step 1: Reproduce

Goal: Confirm you can trigger the bug on demand.

  • Get the exact steps — What did the user do? What input? What sequence?
  • Reproduce it yourself — If you can't reproduce it, you can't verify a fix
  • Find the minimal reproduction — Strip away everything that isn't necessary to trigger the bug
  • Document the environment
    • OS and version
    • Language/runtime version
    • Dependency versions
    • Environment variables or config that matters
  • Note what you expect vs. what actually happens

If you can't reproduce:

  • Check if it's environment-specific (OS, browser, node version)
  • Check if it's state-dependent (specific data, race condition, cache)
  • Add logging and wait for it to happen again

Step 2: Gather Evidence

Goal: Collect all available information before forming theories.

  • Read the error message carefully — The answer is often in the message. Read the full text, not just the first line.
  • Read the full stack trace — Identify which line in YOUR code is the entry point (ignore framework internals at first)
  • Check logs — Application logs, server logs, browser console
  • Check timestamps — When did it start? Does it correlate with a deployment, config change, or data change?
  • Check recent changes
    git log --oneline -20
    git diff HEAD~5..HEAD -- path/to/suspect/area/
    
  • Check monitoring/metrics — Error rates, latency, resource usage
  • Search for the error — Has someone on the team seen this before? Check issues, Slack, docs.

Step 3: Form Hypotheses

Goal: Generate candidate explanations ranked by likelihood.

  • What changed recently? — The most common cause of new bugs is new code
  • What assumptions might be wrong? — About input format, data state, timing, permissions
  • List 2-3 hypotheses — Write them down explicitly:
    1. [Most likely] ...
    2. [Possible] ...
    3. [Less likely but worth checking] ...
  • For each hypothesis, define what evidence would confirm or refute it

Common root causes to consider:

  • Null/undefined where a value was expected
  • Off-by-one or boundary condition
  • Race condition or timing issue
  • Stale cache or state
  • Environment difference (local vs. prod)
  • Dependency version mismatch
  • Incorrect assumption about API contract

Step 4: Test Hypotheses

Goal: Confirm or eliminate each hypothesis with evidence.

  • Start with the most likely hypothesis
  • Add targeted logging — Log the specific values your hypothesis predicts will be wrong
    # Python
    import logging
    logger = logging.getLogger(__name__)
    logger.debug(f"Value at suspect point: {value!r}, type: {type(value)}")
    
    // JavaScript
    console.log('Value at suspect point:', JSON.stringify(value), typeof value);
    
  • Use git bisect for regressions — Find the exact commit that introduced the bug
    git bisect start
    git bisect bad          # Current commit is broken
    git bisect good v1.2.0  # This version was working
    # Test each commit bisect offers, mark good/bad
    
  • Isolate components — Test each component in isolation to narrow the scope
  • Use a debugger for complex state issues

Debugger Quick Reference

Language Tool Start Command
Python pdb import pdb; pdb.set_trace() or breakpoint()
Python logging logging.basicConfig(level=logging.DEBUG)
Python traceback import traceback; traceback.print_exc()
JavaScript debugger debugger; statement in code
JavaScript console console.log(), console.trace(), console.table()
JavaScript Chrome DevTools Open DevTools > Sources > set breakpoint
TypeScript Node inspect node --inspect -r ts-node/register script.ts

Step 5: Fix and Verify

Goal: Apply the minimal correct fix and prove it works.

  • Make the smallest fix possible — Fix the bug, not the whole file. One concern per commit.
  • Write a regression test — A test that fails without your fix and passes with it
    def test_handles_empty_input_without_crash():
        """Regression test for bug #123 — empty input caused TypeError."""
        result = process(input_data="")
        assert result == expected_default
    
  • Verify the fix resolves the original reproduction
  • Run the full test suite — Confirm no side effects
  • Check related code paths — Is the same bug pattern present elsewhere?
    # Search for similar patterns
    grep -rn "similar_pattern" src/
    
  • Test edge cases around the fix — Boundary values, null inputs, concurrent access

Step 6: Document and Prevent

Goal: Prevent this class of bug from recurring.

  • Write a clear commit message explaining what was wrong and why the fix works
  • Update documentation if the bug revealed a misunderstanding
  • Consider systemic fixes:
    • Could a type system catch this? (Add types)
    • Could a linter rule catch this? (Add rule)
    • Could input validation catch this? (Add validation)
    • Could monitoring catch this sooner? (Add alert)

Quick Reference: Debugging Anti-Patterns

Anti-Pattern What to Do Instead
Changing random things until it works Form a hypothesis, test it, iterate
Debugging in production Reproduce locally first
Reading code for hours without running it Add a log statement and run it
Fixing the symptom, not the cause Ask "why?" until you reach the root
Not writing a regression test Always write one before closing the bug
Debugging alone for too long Ask for help after 30 minutes of no progress