Files
claudekit/skills/openapi/references/production-patterns.md
T
2026-04-19 14:10:38 +07:00

16 KiB
Raw Blame History

Production-Grade API Patterns

Four patterns that separate hobby APIs from APIs developers actually trust in production: idempotency keys, rate limiting, optimistic concurrency (ETag), and webhook signing. Plus the async/202 pattern for long-running work.

Each section has: what it is, why it matters, the HTTP contract, and server-side pseudocode.


1. Idempotency Keys

Problem. A client calls POST /payments to charge $50. The request succeeds on the server but the response is lost to a network blip. The client retries. Without idempotency, the customer is charged twice.

Solution. The client generates a UUID per logical operation and sends it as an Idempotency-Key header. The server stores the full response keyed by (idempotencyKey, userId) for a TTL window (Stripe uses 24h). Any retry with the same key returns the stored response — no re-execution.

Key insight — store the response, not the request. Replaying the operation on retry defeats the purpose. You must serve the exact bytes the original call produced, including the status code, headers, and body. Otherwise a second worker racing the first will see inconsistent state.

HTTP contract

POST /v1/payments
Idempotency-Key: 0f6f7a7d-1c6a-4a1e-9d1a-4f2a1b3c4d5e
Authorization: Bearer <token>
Content-Type: application/json

{ "amount": 5000, "currency": "usd", "customerId": "cus_abc" }

Server responses:

  • First call: process normally, store (key, response), return 201 Created + Payment body.
  • Retry (same key, same body): return the stored response verbatim. Include header Idempotent-Replayed: true so clients can log the replay.
  • Retry (same key, different body): return 422 Unprocessable Entity with type: .../problems/idempotency-conflict. The key was reused for a different payload — that is a client bug.
  • Retry during in-flight processing: return 409 Conflict so the client backs off and retries later. Acquire a lock on (key) before starting work.

Server-side pseudocode

async def create_payment(req: Request, key: str | None, user: User):
    if key is None:
        # Idempotency optional, but log the risk.
        return await process_payment(req, user)

    record = await idempotency_store.get(key, user.id)
    if record:
        if record.request_hash != hash(req.body):
            raise ProblemDetail(422, "idempotency-conflict",
                                "Key reused with a different request body.")
        if record.status == "in_progress":
            raise ProblemDetail(409, "idempotency-in-progress",
                                "Original request still processing.")
        return replay(record.response)  # exact bytes

    # Claim the key atomically — losing this race returns 409
    claimed = await idempotency_store.claim(key, user.id, hash(req.body))
    if not claimed:
        raise ProblemDetail(409, "idempotency-in-progress",
                            "Original request still processing.")

    try:
        response = await process_payment(req, user)
        await idempotency_store.save(key, user.id, response, ttl=24h)
        return response
    except Exception as e:
        # Release the claim so the client can retry cleanly.
        await idempotency_store.release(key, user.id)
        raise

Storage: Redis with a 24h TTL is the standard choice. Use a Redis transaction (WATCH/MULTI) or SETNX for the claim step to avoid races.

Scope: key by (idempotencyKey, apiKeyId) so keys from one tenant never collide with another.

Apply to: all POST and PATCH that create or mutate resources with side effects (billing, emails, external API calls). Pure GET/HEAD is already idempotent; PUT and DELETE are idempotent by HTTP semantics but still benefit from replay protection for in-flight retries.


2. Rate Limiting

Problem. One misbehaving client floods your API and degrades everyone else. Without limits, a single bug can take the service down.

Solution. Bound requests per client per time window, return 429 Too Many Requests when exceeded, and publish headers on every response so well-behaved clients can self-throttle before they hit the wall.

Algorithms

Algorithm Burst behavior Memory per key When to pick
Fixed window Allows 2× burst at window boundary O(1) Simple, cheap. Acceptable for soft limits.
Sliding window log Smooth O(n) per key Expensive but precise. Use for billing-grade metering.
Sliding window counter Near-smooth O(1) Default choice. Good accuracy, constant memory.
Token bucket Configurable burst O(1) Use when you want to allow small bursts but bound sustained rate. Common for customer-facing APIs.

Redis + sliding-window-counter is the pragmatic default.

HTTP contract

Every successful response:

X-RateLimit-Limit:     1000          # quota for this window
X-RateLimit-Remaining: 942           # how many calls left
X-RateLimit-Reset:     1767225600    # unix seconds when the window resets

429 response:

HTTP/1.1 429 Too Many Requests
Content-Type:  application/problem+json
Retry-After:   60

{
  "type":   "https://api.example.com/problems/rate-limited",
  "title":  "Too many requests",
  "status": 429,
  "detail": "Rate limit of 1000/hour exceeded. Retry after 60s."
}

Retry-After is mandatory on 429. Clients use it to schedule the retry. Use seconds (integer) rather than HTTP-date — simpler, no clock-skew bugs.

Server-side pseudocode (Redis sliding-window counter)

# Pseudocode — use a battle-tested library (redis-rate-limit, slowapi, limits)
# rather than hand-rolling this in production.

async def enforce_rate_limit(key: str, limit: int, window_seconds: int) -> RateLimitResult:
    now = int(time.time())
    window_start = now - (now % window_seconds)
    prev_window_start = window_start - window_seconds

    # Atomic increment + get prior-window count
    with redis.pipeline(transaction=True) as p:
        p.incr(f"rl:{key}:{window_start}")
        p.expire(f"rl:{key}:{window_start}", window_seconds * 2)
        p.get(f"rl:{key}:{prev_window_start}")
        curr, _, prev = p.execute()

    # Weight the prior window by how much of the current window has elapsed
    elapsed_fraction = (now % window_seconds) / window_seconds
    weighted = int((int(prev or 0)) * (1 - elapsed_fraction)) + int(curr)

    remaining = max(0, limit - weighted)
    reset_at  = window_start + window_seconds

    if weighted > limit:
        return RateLimitResult(
            allowed=False,
            retry_after=reset_at - now,
            limit=limit, remaining=0, reset_at=reset_at,
        )
    return RateLimitResult(
        allowed=True,
        limit=limit, remaining=remaining, reset_at=reset_at,
    )

Key strategy: by authenticated principal (userId or apiKeyId), not by IP. IP-based limiting punishes users behind corporate NATs.

Tiers: different limits for anonymous / free / paid is common. Store the tier on the auth token and look up the limit at enforcement time — don't hard-code.

Where to enforce: at the edge (gateway/middleware) before hitting your application logic. An API gateway (Kong, Envoy, Cloudflare) handles this natively if you use one.


3. Optimistic Concurrency (ETag + If-Match)

Problem. Alice and Bob both load the same user record. Alice updates the email, Bob updates the name. If both PATCH without coordination, the second write silently overwrites the first ("lost update").

Solution. On GET, return an ETag header — an opaque version token. On PATCH, require the client to echo that ETag in If-Match. If the server's current ETag doesn't match, return 412 Precondition Failed and the client must refetch and retry.

ETag generation strategies

Strategy Pros Cons
Version counter (v42) Trivial to compare Needs a version column on every row
updated_at timestamp No schema change Millisecond precision may collide on bulk updates
Hash of body ("a1b2c3") Stateless Recomputed on every GET
Database row version Cheap, natural fit ORM-dependent

All work. Pick whatever matches your data layer.

Weak vs strong: ETag: W/"..." for weak (semantically equivalent), ETag: "..." for strong (byte-identical). Use strong ETags unless you need to support content negotiation.

HTTP contract

GET /v1/users/usr_abc123 HTTP/1.1
Authorization: Bearer ...

HTTP/1.1 200 OK
ETag: "v42"
Content-Type: application/json

{ "id": "usr_abc123", "name": "Alice", "email": "alice@example.com", ... }
PATCH /v1/users/usr_abc123 HTTP/1.1
If-Match: "v42"
Content-Type: application/json

{ "email": "alice@new.example.com" }

Success:

HTTP/1.1 200 OK
ETag: "v43"

{ ... updated user ... }

Conflict — Bob's write races Alice's:

HTTP/1.1 412 Precondition Failed
Content-Type: application/problem+json

{
  "type":   "https://api.example.com/problems/precondition-failed",
  "title":  "Precondition failed",
  "status": 412,
  "detail": "The resource was modified since you last fetched it. Re-fetch and retry."
}

Server-side pseudocode

async def update_user(user_id: str, body: dict, if_match: str | None):
    current = await users.get(user_id)
    if current is None:
        raise ProblemDetail(404, "not-found", f"User '{user_id}' not found.")

    if if_match is None:
        raise ProblemDetail(428, "precondition-required",
                            "If-Match header is required for this operation.")

    current_etag = f'"v{current.version}"'
    if if_match != current_etag:
        raise ProblemDetail(412, "precondition-failed",
                            "The resource was modified since you last fetched it.")

    updated = await users.patch(user_id, body, expected_version=current.version)
    return updated, f'"v{updated.version}"'

428 Precondition Required is the correct response when the server requires If-Match and the client didn't send one. RFC 6585.

Also useful for GETs: If-None-Match: "v42" lets clients skip the body and get 304 Not Modified if nothing changed — cheap cache revalidation.


4. Webhook Signing (HMAC)

Problem. You deliver an event to a consumer URL. Without a signature, anyone who guesses the URL can forge events. Without a timestamp, an attacker who captures one valid payload can replay it forever.

Solution. Sign every webhook with HMAC-SHA256 over timestamp + "." + body, send both in a header, and require consumers to reject signatures older than 5 minutes.

HTTP contract

POST https://consumer.example.com/webhooks/acme HTTP/1.1
Content-Type: application/json
Acme-Signature: t=1767225600,v1=5257a869e7...7f4b
Acme-Webhook-Id: evt_01HTZ4K5M8N9P0Q1R2S3T4V5W6

{
  "id":        "evt_01HTZ4K5M8N9P0Q1R2S3T4V5W6",
  "type":      "order.completed",
  "createdAt": "2026-04-15T10:30:00Z",
  "data":      { "orderId": "ord_xyz", "total": 4999, "currency": "usd" }
}

t= is the unix timestamp when the signature was generated. v1= is the hex-encoded HMAC-SHA256 of t + "." + rawBody using the consumer's signing secret.

The version prefix (v1=) lets you rotate signing schemes in the future without breaking existing consumers.

Server-side signing

import hmac, hashlib, time, json

def sign_webhook(raw_body: bytes, secret: str) -> str:
    t = int(time.time())
    payload = f"{t}.".encode() + raw_body
    sig = hmac.new(secret.encode(), payload, hashlib.sha256).hexdigest()
    return f"t={t},v1={sig}"

async def deliver(endpoint: Endpoint, event: dict):
    raw = json.dumps(event, separators=(",", ":")).encode()
    signature = sign_webhook(raw, endpoint.signing_secret)
    await http.post(
        endpoint.url,
        content=raw,
        headers={
            "Content-Type":    "application/json",
            "Acme-Signature":  signature,
            "Acme-Webhook-Id": event["id"],
        },
    )

Consumer-side verification (for your docs)

import hmac, hashlib, time

MAX_AGE_SECONDS = 300  # 5 minutes

def verify_webhook(raw_body: bytes, header: str, secret: str) -> dict:
    parts = dict(p.split("=", 1) for p in header.split(","))
    t   = int(parts["t"])
    sig = parts["v1"]

    # Replay protection — reject anything older than MAX_AGE
    if abs(time.time() - t) > MAX_AGE_SECONDS:
        raise SignatureError("Timestamp outside tolerance window.")

    expected = hmac.new(
        secret.encode(),
        f"{t}.".encode() + raw_body,
        hashlib.sha256,
    ).hexdigest()

    # Constant-time compare — prevents timing attacks
    if not hmac.compare_digest(expected, sig):
        raise SignatureError("Signature mismatch.")

    return json.loads(raw_body)

Three non-negotiables:

  1. Sign timestamp + body, not just body. Without the timestamp, replay protection is impossible.
  2. Use constant-time comparison (hmac.compare_digest). Never ==. Side-channel leaks.
  3. Verify against the raw body bytes, not a parsed-and-reserialized version. JSON serializers don't roundtrip byte-for-byte.

Retry and dedup

  • Retry on any non-2xx response with exponential backoff: 1m, 5m, 15m, 1h, 6h, 24h (cap at ~24h total).
  • Include a unique event id in every payload; consumers must dedupe on it (retries will re-send the same id).
  • Consumer must respond within ~5s with any 2xx. Do the actual work in a background job.

5. Async Long-Running Operations (202 Accepted)

Problem. Generating a report takes 30 seconds. You can't hold an HTTP connection that long — load balancers kill it, clients time out.

Solution. Return 202 Accepted immediately with a Location header pointing to a status resource. The client polls (or subscribes to a webhook) until the job completes.

HTTP contract

POST /v1/reports HTTP/1.1
Content-Type: application/json

{ "type": "sales", "startDate": "2026-01-01", "endDate": "2026-03-31" }
HTTP/1.1 202 Accepted
Location:    /v1/jobs/job_01HTZ9K5M8N9P0Q1R2S3T4V5W6
Retry-After: 5

{
  "id":        "job_01HTZ9K5M8N9P0Q1R2S3T4V5W6",
  "status":    "queued",
  "createdAt": "2026-04-15T10:30:00Z"
}
GET /v1/jobs/job_01HTZ9K5M8N9P0Q1R2S3T4V5W6 HTTP/1.1

HTTP/1.1 200 OK
{
  "id":         "job_01HTZ9K5M8N9P0Q1R2S3T4V5W6",
  "status":     "completed",
  "createdAt":  "2026-04-15T10:30:00Z",
  "completedAt":"2026-04-15T10:30:32Z",
  "result":     { "reportUrl": "https://cdn.example.com/reports/xyz.csv" }
}

Job states: queuedrunningcompleted | failed | cancelled.

On failed, embed a ProblemDetails object in the job body under error so the client gets structured failure info without a separate endpoint.

Webhook option: let the client register a callback URL on the job creation ("callbackUrl": "https://...") and deliver a webhook when the job terminates, following the signing pattern above. Saves polling and is what mature APIs offer as the default.


Applying this in OpenAPI

The starter template openapi-3.1-starter.yaml already demonstrates four of these patterns on the /users endpoints:

Pattern Where in the template
Idempotency keys POST /usersIdempotencyKeyHeader parameter
Rate limit headers GET /users responses → X-RateLimit-* header refs
ETag + If-Match GET + PATCH /users/{userId}ETag header, If-Match param, 412 response
Problem Details errors All 4xx/5xx responses use application/problem+json

Copy the relevant parameters, headers, and responses blocks into your own spec.