mirror of
https://github.com/duthaho/claudekit.git
synced 2026-06-11 04:34:56 +03:00
5.8 KiB
5.8 KiB
Tracing Techniques Reference
Backward-tracing techniques for systematic root cause analysis.
Stack Trace Analysis
Reading a Stack Trace
- Start at the bottom (most recent call) to find the immediate failure
- Scan upward to find the first frame in your code (not library code)
- That frame is usually the symptom location, not the cause
- Continue upward to find where bad data or state originated
Symptom vs Cause
| What You See | Likely Actual Cause |
|---|---|
NullPointerException / TypeError: cannot read property of undefined |
Value not set upstream, missing null check at origin |
IndexOutOfBoundsException |
Off-by-one in loop logic or empty collection not guarded |
ConnectionRefusedError |
Service down, wrong port, firewall rule, DNS resolution |
TimeoutError |
Deadlock, resource exhaustion, slow query, network partition |
ValidationError |
Caller passing wrong shape, schema mismatch, migration gap |
Tips
- Filter out framework frames to reduce noise
- In async code, the stack may be split; look for
caused byorprevioussections - In Python, read
__cause__and__context__on chained exceptions - In TypeScript/Node, check
error.cause(ES2022+)
Binary Search / Git Bisect
When to Use
- Bug exists now but worked at some known-good point
- Reproducer is automatable (script, test command)
Process
git bisect start
git bisect bad # current commit is broken
git bisect good <known-good-sha> # last known working commit
# Git checks out a midpoint; run your test
git bisect good # or bad, based on result
# Repeat until Git identifies the first bad commit
git bisect reset # return to original branch
Automated Bisect
git bisect start HEAD <good-sha>
git bisect run ./test-script.sh
# Exit 0 = good, exit 1 = bad, exit 125 = skip
Log Correlation
Technique
- Identify the exact timestamp of the error
- Search all related service logs within a window (e.g., +/- 30 seconds)
- Filter by correlation ID, request ID, or user ID across services
- Build a timeline of events across services
Correlation Fields to Look For
request_idortrace_id(distributed tracing)user_idorsession_id- Source IP or client identifier
- Timestamps (normalize to UTC)
Tools
grep/rgwith timestamp ranges- Structured logging with JSON output +
jq - Distributed tracing (OpenTelemetry, Jaeger, Zipkin)
Dependency Analysis (Backward Data Flow)
Process
- Start at the error location
- Identify the variable or value that is wrong
- Trace backward: where was this value set?
- At each step, ask: is this value correct here? If yes, move forward. If no, keep going back.
- The root cause is where correct data first becomes incorrect.
Common Data Flow Points
User Input -> Validation -> Transform -> Business Logic -> Persistence -> Query -> Response
Trace backward through this chain from wherever the error manifests.
Dependency Categories
| Dependency | What to Check |
|---|---|
| Function arguments | Caller passing wrong values |
| Config / env vars | Wrong environment, stale config |
| Database state | Missing migration, corrupt data |
| External API | Changed response format, auth expiry |
| Shared state | Race condition, stale cache |
Instrumentation Points
Where to Add Temporary Logging
- Entry/exit of suspected function — log arguments and return value
- Before/after external calls — log request and response
- Branch points — log which path was taken and why
- Data transformation steps — log before and after
- Error handlers — log the full error with context
Guidelines
- Use a distinct prefix (e.g.,
[DEBUG-TRACE]) so logs are easy to find and remove - Log the type as well as the value (catches
"null"vsnull) - In production, use feature flags or debug log levels, not code changes
- Remove all temporary logging before committing
Python Example
import logging
logger = logging.getLogger(__name__)
def process_order(order_id: str) -> Order:
logger.debug("[DEBUG-TRACE] process_order called with: %s (type: %s)", order_id, type(order_id))
order = db.get_order(order_id)
logger.debug("[DEBUG-TRACE] db.get_order returned: %s", order)
# ... rest of logic
TypeScript Example
function processOrder(orderId: string): Order {
console.debug(`[DEBUG-TRACE] processOrder called with: ${orderId} (type: ${typeof orderId})`);
const order = db.getOrder(orderId);
console.debug(`[DEBUG-TRACE] db.getOrder returned:`, order);
// ... rest of logic
}
Common Root Cause Categories
| Category | Symptoms | Investigation Approach |
|---|---|---|
| Data issues | Wrong output, validation errors, corrupt state | Trace the bad value backward through the data flow |
| Race conditions | Intermittent failures, works-on-retry, order-dependent | Look for shared mutable state, add timing logs, test with delays |
| Config drift | Works locally but not in staging/prod | Diff environment configs, check env vars, verify secrets |
| Dependency changes | Broke after deploy with no code changes | Check lock file diffs, dependency changelogs, API version headers |
| Resource exhaustion | Timeouts, OOM, connection pool errors | Monitor metrics (memory, CPU, connections, disk), check for leaks |
| Schema mismatch | Serialization errors, missing fields | Compare expected vs actual schema, check migration status |
Quick Decision: Which Technique to Use
| Situation | Start With |
|---|---|
| Have a stack trace | Stack trace analysis |
| "It used to work" | Git bisect |
| Multi-service issue | Log correlation |
| Wrong data in output | Backward data flow |
| No idea where to start | Add instrumentation at boundaries |