# Production-Grade API Patterns Four patterns that separate hobby APIs from APIs developers actually trust in production: **idempotency keys**, **rate limiting**, **optimistic concurrency (ETag)**, and **webhook signing**. Plus the **async/202** pattern for long-running work. Each section has: what it is, why it matters, the HTTP contract, and server-side pseudocode. --- ## 1. Idempotency Keys **Problem.** A client calls `POST /payments` to charge $50. The request succeeds on the server but the response is lost to a network blip. The client retries. Without idempotency, the customer is charged twice. **Solution.** The client generates a UUID per logical operation and sends it as an `Idempotency-Key` header. The server stores the full response keyed by `(idempotencyKey, userId)` for a TTL window (Stripe uses 24h). Any retry with the same key returns the stored response — no re-execution. **Key insight — store the *response*, not the request.** Replaying the operation on retry defeats the purpose. You must serve the exact bytes the original call produced, including the status code, headers, and body. Otherwise a second worker racing the first will see inconsistent state. ### HTTP contract ``` POST /v1/payments Idempotency-Key: 0f6f7a7d-1c6a-4a1e-9d1a-4f2a1b3c4d5e Authorization: Bearer Content-Type: application/json { "amount": 5000, "currency": "usd", "customerId": "cus_abc" } ``` Server responses: - **First call:** process normally, store `(key, response)`, return `201 Created` + `Payment` body. - **Retry (same key, same body):** return the stored response verbatim. Include header `Idempotent-Replayed: true` so clients can log the replay. - **Retry (same key, different body):** return `422 Unprocessable Entity` with `type: .../problems/idempotency-conflict`. The key was reused for a different payload — that is a client bug. - **Retry during in-flight processing:** return `409 Conflict` so the client backs off and retries later. Acquire a lock on `(key)` before starting work. ### Server-side pseudocode ```python async def create_payment(req: Request, key: str | None, user: User): if key is None: # Idempotency optional, but log the risk. return await process_payment(req, user) record = await idempotency_store.get(key, user.id) if record: if record.request_hash != hash(req.body): raise ProblemDetail(422, "idempotency-conflict", "Key reused with a different request body.") if record.status == "in_progress": raise ProblemDetail(409, "idempotency-in-progress", "Original request still processing.") return replay(record.response) # exact bytes # Claim the key atomically — losing this race returns 409 claimed = await idempotency_store.claim(key, user.id, hash(req.body)) if not claimed: raise ProblemDetail(409, "idempotency-in-progress", "Original request still processing.") try: response = await process_payment(req, user) await idempotency_store.save(key, user.id, response, ttl=24h) return response except Exception as e: # Release the claim so the client can retry cleanly. await idempotency_store.release(key, user.id) raise ``` **Storage:** Redis with a 24h TTL is the standard choice. Use a Redis transaction (`WATCH`/`MULTI`) or `SETNX` for the claim step to avoid races. **Scope:** key by `(idempotencyKey, apiKeyId)` so keys from one tenant never collide with another. **Apply to:** all `POST` and `PATCH` that create or mutate resources with side effects (billing, emails, external API calls). Pure `GET`/`HEAD` is already idempotent; `PUT` and `DELETE` are idempotent by HTTP semantics but still benefit from replay protection for in-flight retries. --- ## 2. Rate Limiting **Problem.** One misbehaving client floods your API and degrades everyone else. Without limits, a single bug can take the service down. **Solution.** Bound requests per client per time window, return `429 Too Many Requests` when exceeded, and publish headers on every response so well-behaved clients can self-throttle before they hit the wall. ### Algorithms | Algorithm | Burst behavior | Memory per key | When to pick | |-----------|---------------|----------------|--------------| | **Fixed window** | Allows 2× burst at window boundary | O(1) | Simple, cheap. Acceptable for soft limits. | | **Sliding window log** | Smooth | O(n) per key | Expensive but precise. Use for billing-grade metering. | | **Sliding window counter** | Near-smooth | O(1) | **Default choice.** Good accuracy, constant memory. | | **Token bucket** | Configurable burst | O(1) | Use when you want to allow small bursts but bound sustained rate. Common for customer-facing APIs. | Redis + sliding-window-counter is the pragmatic default. ### HTTP contract **Every successful response:** ``` X-RateLimit-Limit: 1000 # quota for this window X-RateLimit-Remaining: 942 # how many calls left X-RateLimit-Reset: 1767225600 # unix seconds when the window resets ``` **429 response:** ``` HTTP/1.1 429 Too Many Requests Content-Type: application/problem+json Retry-After: 60 { "type": "https://api.example.com/problems/rate-limited", "title": "Too many requests", "status": 429, "detail": "Rate limit of 1000/hour exceeded. Retry after 60s." } ``` **`Retry-After` is mandatory on 429.** Clients use it to schedule the retry. Use seconds (integer) rather than HTTP-date — simpler, no clock-skew bugs. ### Server-side pseudocode (Redis sliding-window counter) ```python # Pseudocode — use a battle-tested library (redis-rate-limit, slowapi, limits) # rather than hand-rolling this in production. async def enforce_rate_limit(key: str, limit: int, window_seconds: int) -> RateLimitResult: now = int(time.time()) window_start = now - (now % window_seconds) prev_window_start = window_start - window_seconds # Atomic increment + get prior-window count with redis.pipeline(transaction=True) as p: p.incr(f"rl:{key}:{window_start}") p.expire(f"rl:{key}:{window_start}", window_seconds * 2) p.get(f"rl:{key}:{prev_window_start}") curr, _, prev = p.execute() # Weight the prior window by how much of the current window has elapsed elapsed_fraction = (now % window_seconds) / window_seconds weighted = int((int(prev or 0)) * (1 - elapsed_fraction)) + int(curr) remaining = max(0, limit - weighted) reset_at = window_start + window_seconds if weighted > limit: return RateLimitResult( allowed=False, retry_after=reset_at - now, limit=limit, remaining=0, reset_at=reset_at, ) return RateLimitResult( allowed=True, limit=limit, remaining=remaining, reset_at=reset_at, ) ``` **Key strategy:** by authenticated principal (`userId` or `apiKeyId`), not by IP. IP-based limiting punishes users behind corporate NATs. **Tiers:** different limits for anonymous / free / paid is common. Store the tier on the auth token and look up the limit at enforcement time — don't hard-code. **Where to enforce:** at the edge (gateway/middleware) before hitting your application logic. An API gateway (Kong, Envoy, Cloudflare) handles this natively if you use one. --- ## 3. Optimistic Concurrency (ETag + If-Match) **Problem.** Alice and Bob both load the same user record. Alice updates the email, Bob updates the name. If both `PATCH` without coordination, the second write silently overwrites the first ("lost update"). **Solution.** On `GET`, return an `ETag` header — an opaque version token. On `PATCH`, require the client to echo that ETag in `If-Match`. If the server's current ETag doesn't match, return `412 Precondition Failed` and the client must refetch and retry. ### ETag generation strategies | Strategy | Pros | Cons | |----------|------|------| | Version counter (`v42`) | Trivial to compare | Needs a `version` column on every row | | `updated_at` timestamp | No schema change | Millisecond precision may collide on bulk updates | | Hash of body (`"a1b2c3"`) | Stateless | Recomputed on every GET | | Database row version | Cheap, natural fit | ORM-dependent | All work. Pick whatever matches your data layer. **Weak vs strong:** `ETag: W/"..."` for weak (semantically equivalent), `ETag: "..."` for strong (byte-identical). Use strong ETags unless you need to support content negotiation. ### HTTP contract ``` GET /v1/users/usr_abc123 HTTP/1.1 Authorization: Bearer ... HTTP/1.1 200 OK ETag: "v42" Content-Type: application/json { "id": "usr_abc123", "name": "Alice", "email": "alice@example.com", ... } ``` ``` PATCH /v1/users/usr_abc123 HTTP/1.1 If-Match: "v42" Content-Type: application/json { "email": "alice@new.example.com" } ``` **Success:** ``` HTTP/1.1 200 OK ETag: "v43" { ... updated user ... } ``` **Conflict — Bob's write races Alice's:** ``` HTTP/1.1 412 Precondition Failed Content-Type: application/problem+json { "type": "https://api.example.com/problems/precondition-failed", "title": "Precondition failed", "status": 412, "detail": "The resource was modified since you last fetched it. Re-fetch and retry." } ``` ### Server-side pseudocode ```python async def update_user(user_id: str, body: dict, if_match: str | None): current = await users.get(user_id) if current is None: raise ProblemDetail(404, "not-found", f"User '{user_id}' not found.") if if_match is None: raise ProblemDetail(428, "precondition-required", "If-Match header is required for this operation.") current_etag = f'"v{current.version}"' if if_match != current_etag: raise ProblemDetail(412, "precondition-failed", "The resource was modified since you last fetched it.") updated = await users.patch(user_id, body, expected_version=current.version) return updated, f'"v{updated.version}"' ``` **`428 Precondition Required`** is the correct response when the server *requires* `If-Match` and the client didn't send one. RFC 6585. **Also useful for GETs:** `If-None-Match: "v42"` lets clients skip the body and get `304 Not Modified` if nothing changed — cheap cache revalidation. --- ## 4. Webhook Signing (HMAC) **Problem.** You deliver an event to a consumer URL. Without a signature, anyone who guesses the URL can forge events. Without a timestamp, an attacker who captures one valid payload can replay it forever. **Solution.** Sign every webhook with HMAC-SHA256 over `timestamp + "." + body`, send both in a header, and require consumers to reject signatures older than 5 minutes. ### HTTP contract ``` POST https://consumer.example.com/webhooks/acme HTTP/1.1 Content-Type: application/json Acme-Signature: t=1767225600,v1=5257a869e7...7f4b Acme-Webhook-Id: evt_01HTZ4K5M8N9P0Q1R2S3T4V5W6 { "id": "evt_01HTZ4K5M8N9P0Q1R2S3T4V5W6", "type": "order.completed", "createdAt": "2026-04-15T10:30:00Z", "data": { "orderId": "ord_xyz", "total": 4999, "currency": "usd" } } ``` **`t=` is the unix timestamp when the signature was generated.** **`v1=` is the hex-encoded HMAC-SHA256 of `t + "." + rawBody` using the consumer's signing secret.** The version prefix (`v1=`) lets you rotate signing schemes in the future without breaking existing consumers. ### Server-side signing ```python import hmac, hashlib, time, json def sign_webhook(raw_body: bytes, secret: str) -> str: t = int(time.time()) payload = f"{t}.".encode() + raw_body sig = hmac.new(secret.encode(), payload, hashlib.sha256).hexdigest() return f"t={t},v1={sig}" async def deliver(endpoint: Endpoint, event: dict): raw = json.dumps(event, separators=(",", ":")).encode() signature = sign_webhook(raw, endpoint.signing_secret) await http.post( endpoint.url, content=raw, headers={ "Content-Type": "application/json", "Acme-Signature": signature, "Acme-Webhook-Id": event["id"], }, ) ``` ### Consumer-side verification (for your docs) ```python import hmac, hashlib, time MAX_AGE_SECONDS = 300 # 5 minutes def verify_webhook(raw_body: bytes, header: str, secret: str) -> dict: parts = dict(p.split("=", 1) for p in header.split(",")) t = int(parts["t"]) sig = parts["v1"] # Replay protection — reject anything older than MAX_AGE if abs(time.time() - t) > MAX_AGE_SECONDS: raise SignatureError("Timestamp outside tolerance window.") expected = hmac.new( secret.encode(), f"{t}.".encode() + raw_body, hashlib.sha256, ).hexdigest() # Constant-time compare — prevents timing attacks if not hmac.compare_digest(expected, sig): raise SignatureError("Signature mismatch.") return json.loads(raw_body) ``` **Three non-negotiables:** 1. **Sign `timestamp + body`, not just body.** Without the timestamp, replay protection is impossible. 2. **Use constant-time comparison (`hmac.compare_digest`).** Never `==`. Side-channel leaks. 3. **Verify against the raw body bytes**, not a parsed-and-reserialized version. JSON serializers don't roundtrip byte-for-byte. ### Retry and dedup - Retry on any non-2xx response with exponential backoff: 1m, 5m, 15m, 1h, 6h, 24h (cap at ~24h total). - Include a unique event `id` in every payload; consumers must dedupe on it (retries will re-send the same `id`). - Consumer must respond within ~5s with any 2xx. Do the actual work in a background job. --- ## 5. Async Long-Running Operations (202 Accepted) **Problem.** Generating a report takes 30 seconds. You can't hold an HTTP connection that long — load balancers kill it, clients time out. **Solution.** Return `202 Accepted` immediately with a `Location` header pointing to a status resource. The client polls (or subscribes to a webhook) until the job completes. ### HTTP contract ``` POST /v1/reports HTTP/1.1 Content-Type: application/json { "type": "sales", "startDate": "2026-01-01", "endDate": "2026-03-31" } ``` ``` HTTP/1.1 202 Accepted Location: /v1/jobs/job_01HTZ9K5M8N9P0Q1R2S3T4V5W6 Retry-After: 5 { "id": "job_01HTZ9K5M8N9P0Q1R2S3T4V5W6", "status": "queued", "createdAt": "2026-04-15T10:30:00Z" } ``` ``` GET /v1/jobs/job_01HTZ9K5M8N9P0Q1R2S3T4V5W6 HTTP/1.1 HTTP/1.1 200 OK { "id": "job_01HTZ9K5M8N9P0Q1R2S3T4V5W6", "status": "completed", "createdAt": "2026-04-15T10:30:00Z", "completedAt":"2026-04-15T10:30:32Z", "result": { "reportUrl": "https://cdn.example.com/reports/xyz.csv" } } ``` **Job states:** `queued` → `running` → `completed` | `failed` | `cancelled`. On `failed`, embed a `ProblemDetails` object in the job body under `error` so the client gets structured failure info without a separate endpoint. **Webhook option:** let the client register a callback URL on the job creation (`"callbackUrl": "https://..."`) and deliver a webhook when the job terminates, following the signing pattern above. Saves polling and is what mature APIs offer as the default. --- ## Applying this in OpenAPI The starter template [openapi-3.1-starter.yaml](../templates/openapi-3.1-starter.yaml) already demonstrates four of these patterns on the `/users` endpoints: | Pattern | Where in the template | |---------|----------------------| | Idempotency keys | `POST /users` → `IdempotencyKeyHeader` parameter | | Rate limit headers | `GET /users` responses → `X-RateLimit-*` header refs | | ETag + If-Match | `GET` + `PATCH /users/{userId}` → `ETag` header, `If-Match` param, `412` response | | Problem Details errors | All `4xx`/`5xx` responses use `application/problem+json` | Copy the relevant `parameters`, `headers`, and `responses` blocks into your own spec. --- ## Related - [http-status-codes.md](http-status-codes.md) — 202, 409, 412, 428, 429 selection rules - [rest-naming.md](rest-naming.md) — URL conventions - [api-governance.md](api-governance.md) — linting, docs, client gen, contract testing - [RFC 9457](https://www.rfc-editor.org/rfc/rfc9457) — Problem Details for HTTP APIs - [Stripe: Designing APIs with Idempotency](https://stripe.com/blog/idempotency) — the canonical write-up