16 KiB
Production-Grade API Patterns
Four patterns that separate hobby APIs from APIs developers actually trust in production: idempotency keys, rate limiting, optimistic concurrency (ETag), and webhook signing. Plus the async/202 pattern for long-running work.
Each section has: what it is, why it matters, the HTTP contract, and server-side pseudocode.
1. Idempotency Keys
Problem. A client calls POST /payments to charge $50. The request succeeds on the server but the response is lost to a network blip. The client retries. Without idempotency, the customer is charged twice.
Solution. The client generates a UUID per logical operation and sends it as an Idempotency-Key header. The server stores the full response keyed by (idempotencyKey, userId) for a TTL window (Stripe uses 24h). Any retry with the same key returns the stored response — no re-execution.
Key insight — store the response, not the request. Replaying the operation on retry defeats the purpose. You must serve the exact bytes the original call produced, including the status code, headers, and body. Otherwise a second worker racing the first will see inconsistent state.
HTTP contract
POST /v1/payments
Idempotency-Key: 0f6f7a7d-1c6a-4a1e-9d1a-4f2a1b3c4d5e
Authorization: Bearer <token>
Content-Type: application/json
{ "amount": 5000, "currency": "usd", "customerId": "cus_abc" }
Server responses:
- First call: process normally, store
(key, response), return201 Created+Paymentbody. - Retry (same key, same body): return the stored response verbatim. Include header
Idempotent-Replayed: trueso clients can log the replay. - Retry (same key, different body): return
422 Unprocessable Entitywithtype: .../problems/idempotency-conflict. The key was reused for a different payload — that is a client bug. - Retry during in-flight processing: return
409 Conflictso the client backs off and retries later. Acquire a lock on(key)before starting work.
Server-side pseudocode
async def create_payment(req: Request, key: str | None, user: User):
if key is None:
# Idempotency optional, but log the risk.
return await process_payment(req, user)
record = await idempotency_store.get(key, user.id)
if record:
if record.request_hash != hash(req.body):
raise ProblemDetail(422, "idempotency-conflict",
"Key reused with a different request body.")
if record.status == "in_progress":
raise ProblemDetail(409, "idempotency-in-progress",
"Original request still processing.")
return replay(record.response) # exact bytes
# Claim the key atomically — losing this race returns 409
claimed = await idempotency_store.claim(key, user.id, hash(req.body))
if not claimed:
raise ProblemDetail(409, "idempotency-in-progress",
"Original request still processing.")
try:
response = await process_payment(req, user)
await idempotency_store.save(key, user.id, response, ttl=24h)
return response
except Exception as e:
# Release the claim so the client can retry cleanly.
await idempotency_store.release(key, user.id)
raise
Storage: Redis with a 24h TTL is the standard choice. Use a Redis transaction (WATCH/MULTI) or SETNX for the claim step to avoid races.
Scope: key by (idempotencyKey, apiKeyId) so keys from one tenant never collide with another.
Apply to: all POST and PATCH that create or mutate resources with side effects (billing, emails, external API calls). Pure GET/HEAD is already idempotent; PUT and DELETE are idempotent by HTTP semantics but still benefit from replay protection for in-flight retries.
2. Rate Limiting
Problem. One misbehaving client floods your API and degrades everyone else. Without limits, a single bug can take the service down.
Solution. Bound requests per client per time window, return 429 Too Many Requests when exceeded, and publish headers on every response so well-behaved clients can self-throttle before they hit the wall.
Algorithms
| Algorithm | Burst behavior | Memory per key | When to pick |
|---|---|---|---|
| Fixed window | Allows 2× burst at window boundary | O(1) | Simple, cheap. Acceptable for soft limits. |
| Sliding window log | Smooth | O(n) per key | Expensive but precise. Use for billing-grade metering. |
| Sliding window counter | Near-smooth | O(1) | Default choice. Good accuracy, constant memory. |
| Token bucket | Configurable burst | O(1) | Use when you want to allow small bursts but bound sustained rate. Common for customer-facing APIs. |
Redis + sliding-window-counter is the pragmatic default.
HTTP contract
Every successful response:
X-RateLimit-Limit: 1000 # quota for this window
X-RateLimit-Remaining: 942 # how many calls left
X-RateLimit-Reset: 1767225600 # unix seconds when the window resets
429 response:
HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
Retry-After: 60
{
"type": "https://api.example.com/problems/rate-limited",
"title": "Too many requests",
"status": 429,
"detail": "Rate limit of 1000/hour exceeded. Retry after 60s."
}
Retry-After is mandatory on 429. Clients use it to schedule the retry. Use seconds (integer) rather than HTTP-date — simpler, no clock-skew bugs.
Server-side pseudocode (Redis sliding-window counter)
# Pseudocode — use a battle-tested library (redis-rate-limit, slowapi, limits)
# rather than hand-rolling this in production.
async def enforce_rate_limit(key: str, limit: int, window_seconds: int) -> RateLimitResult:
now = int(time.time())
window_start = now - (now % window_seconds)
prev_window_start = window_start - window_seconds
# Atomic increment + get prior-window count
with redis.pipeline(transaction=True) as p:
p.incr(f"rl:{key}:{window_start}")
p.expire(f"rl:{key}:{window_start}", window_seconds * 2)
p.get(f"rl:{key}:{prev_window_start}")
curr, _, prev = p.execute()
# Weight the prior window by how much of the current window has elapsed
elapsed_fraction = (now % window_seconds) / window_seconds
weighted = int((int(prev or 0)) * (1 - elapsed_fraction)) + int(curr)
remaining = max(0, limit - weighted)
reset_at = window_start + window_seconds
if weighted > limit:
return RateLimitResult(
allowed=False,
retry_after=reset_at - now,
limit=limit, remaining=0, reset_at=reset_at,
)
return RateLimitResult(
allowed=True,
limit=limit, remaining=remaining, reset_at=reset_at,
)
Key strategy: by authenticated principal (userId or apiKeyId), not by IP. IP-based limiting punishes users behind corporate NATs.
Tiers: different limits for anonymous / free / paid is common. Store the tier on the auth token and look up the limit at enforcement time — don't hard-code.
Where to enforce: at the edge (gateway/middleware) before hitting your application logic. An API gateway (Kong, Envoy, Cloudflare) handles this natively if you use one.
3. Optimistic Concurrency (ETag + If-Match)
Problem. Alice and Bob both load the same user record. Alice updates the email, Bob updates the name. If both PATCH without coordination, the second write silently overwrites the first ("lost update").
Solution. On GET, return an ETag header — an opaque version token. On PATCH, require the client to echo that ETag in If-Match. If the server's current ETag doesn't match, return 412 Precondition Failed and the client must refetch and retry.
ETag generation strategies
| Strategy | Pros | Cons |
|---|---|---|
Version counter (v42) |
Trivial to compare | Needs a version column on every row |
updated_at timestamp |
No schema change | Millisecond precision may collide on bulk updates |
Hash of body ("a1b2c3") |
Stateless | Recomputed on every GET |
| Database row version | Cheap, natural fit | ORM-dependent |
All work. Pick whatever matches your data layer.
Weak vs strong: ETag: W/"..." for weak (semantically equivalent), ETag: "..." for strong (byte-identical). Use strong ETags unless you need to support content negotiation.
HTTP contract
GET /v1/users/usr_abc123 HTTP/1.1
Authorization: Bearer ...
HTTP/1.1 200 OK
ETag: "v42"
Content-Type: application/json
{ "id": "usr_abc123", "name": "Alice", "email": "alice@example.com", ... }
PATCH /v1/users/usr_abc123 HTTP/1.1
If-Match: "v42"
Content-Type: application/json
{ "email": "alice@new.example.com" }
Success:
HTTP/1.1 200 OK
ETag: "v43"
{ ... updated user ... }
Conflict — Bob's write races Alice's:
HTTP/1.1 412 Precondition Failed
Content-Type: application/problem+json
{
"type": "https://api.example.com/problems/precondition-failed",
"title": "Precondition failed",
"status": 412,
"detail": "The resource was modified since you last fetched it. Re-fetch and retry."
}
Server-side pseudocode
async def update_user(user_id: str, body: dict, if_match: str | None):
current = await users.get(user_id)
if current is None:
raise ProblemDetail(404, "not-found", f"User '{user_id}' not found.")
if if_match is None:
raise ProblemDetail(428, "precondition-required",
"If-Match header is required for this operation.")
current_etag = f'"v{current.version}"'
if if_match != current_etag:
raise ProblemDetail(412, "precondition-failed",
"The resource was modified since you last fetched it.")
updated = await users.patch(user_id, body, expected_version=current.version)
return updated, f'"v{updated.version}"'
428 Precondition Required is the correct response when the server requires If-Match and the client didn't send one. RFC 6585.
Also useful for GETs: If-None-Match: "v42" lets clients skip the body and get 304 Not Modified if nothing changed — cheap cache revalidation.
4. Webhook Signing (HMAC)
Problem. You deliver an event to a consumer URL. Without a signature, anyone who guesses the URL can forge events. Without a timestamp, an attacker who captures one valid payload can replay it forever.
Solution. Sign every webhook with HMAC-SHA256 over timestamp + "." + body, send both in a header, and require consumers to reject signatures older than 5 minutes.
HTTP contract
POST https://consumer.example.com/webhooks/acme HTTP/1.1
Content-Type: application/json
Acme-Signature: t=1767225600,v1=5257a869e7...7f4b
Acme-Webhook-Id: evt_01HTZ4K5M8N9P0Q1R2S3T4V5W6
{
"id": "evt_01HTZ4K5M8N9P0Q1R2S3T4V5W6",
"type": "order.completed",
"createdAt": "2026-04-15T10:30:00Z",
"data": { "orderId": "ord_xyz", "total": 4999, "currency": "usd" }
}
t= is the unix timestamp when the signature was generated.
v1= is the hex-encoded HMAC-SHA256 of t + "." + rawBody using the consumer's signing secret.
The version prefix (v1=) lets you rotate signing schemes in the future without breaking existing consumers.
Server-side signing
import hmac, hashlib, time, json
def sign_webhook(raw_body: bytes, secret: str) -> str:
t = int(time.time())
payload = f"{t}.".encode() + raw_body
sig = hmac.new(secret.encode(), payload, hashlib.sha256).hexdigest()
return f"t={t},v1={sig}"
async def deliver(endpoint: Endpoint, event: dict):
raw = json.dumps(event, separators=(",", ":")).encode()
signature = sign_webhook(raw, endpoint.signing_secret)
await http.post(
endpoint.url,
content=raw,
headers={
"Content-Type": "application/json",
"Acme-Signature": signature,
"Acme-Webhook-Id": event["id"],
},
)
Consumer-side verification (for your docs)
import hmac, hashlib, time
MAX_AGE_SECONDS = 300 # 5 minutes
def verify_webhook(raw_body: bytes, header: str, secret: str) -> dict:
parts = dict(p.split("=", 1) for p in header.split(","))
t = int(parts["t"])
sig = parts["v1"]
# Replay protection — reject anything older than MAX_AGE
if abs(time.time() - t) > MAX_AGE_SECONDS:
raise SignatureError("Timestamp outside tolerance window.")
expected = hmac.new(
secret.encode(),
f"{t}.".encode() + raw_body,
hashlib.sha256,
).hexdigest()
# Constant-time compare — prevents timing attacks
if not hmac.compare_digest(expected, sig):
raise SignatureError("Signature mismatch.")
return json.loads(raw_body)
Three non-negotiables:
- Sign
timestamp + body, not just body. Without the timestamp, replay protection is impossible. - Use constant-time comparison (
hmac.compare_digest). Never==. Side-channel leaks. - Verify against the raw body bytes, not a parsed-and-reserialized version. JSON serializers don't roundtrip byte-for-byte.
Retry and dedup
- Retry on any non-2xx response with exponential backoff: 1m, 5m, 15m, 1h, 6h, 24h (cap at ~24h total).
- Include a unique event
idin every payload; consumers must dedupe on it (retries will re-send the sameid). - Consumer must respond within ~5s with any 2xx. Do the actual work in a background job.
5. Async Long-Running Operations (202 Accepted)
Problem. Generating a report takes 30 seconds. You can't hold an HTTP connection that long — load balancers kill it, clients time out.
Solution. Return 202 Accepted immediately with a Location header pointing to a status resource. The client polls (or subscribes to a webhook) until the job completes.
HTTP contract
POST /v1/reports HTTP/1.1
Content-Type: application/json
{ "type": "sales", "startDate": "2026-01-01", "endDate": "2026-03-31" }
HTTP/1.1 202 Accepted
Location: /v1/jobs/job_01HTZ9K5M8N9P0Q1R2S3T4V5W6
Retry-After: 5
{
"id": "job_01HTZ9K5M8N9P0Q1R2S3T4V5W6",
"status": "queued",
"createdAt": "2026-04-15T10:30:00Z"
}
GET /v1/jobs/job_01HTZ9K5M8N9P0Q1R2S3T4V5W6 HTTP/1.1
HTTP/1.1 200 OK
{
"id": "job_01HTZ9K5M8N9P0Q1R2S3T4V5W6",
"status": "completed",
"createdAt": "2026-04-15T10:30:00Z",
"completedAt":"2026-04-15T10:30:32Z",
"result": { "reportUrl": "https://cdn.example.com/reports/xyz.csv" }
}
Job states: queued → running → completed | failed | cancelled.
On failed, embed a ProblemDetails object in the job body under error so the client gets structured failure info without a separate endpoint.
Webhook option: let the client register a callback URL on the job creation ("callbackUrl": "https://...") and deliver a webhook when the job terminates, following the signing pattern above. Saves polling and is what mature APIs offer as the default.
Applying this in OpenAPI
The starter template openapi-3.1-starter.yaml already demonstrates four of these patterns on the /users endpoints:
| Pattern | Where in the template |
|---|---|
| Idempotency keys | POST /users → IdempotencyKeyHeader parameter |
| Rate limit headers | GET /users responses → X-RateLimit-* header refs |
| ETag + If-Match | GET + PATCH /users/{userId} → ETag header, If-Match param, 412 response |
| Problem Details errors | All 4xx/5xx responses use application/problem+json |
Copy the relevant parameters, headers, and responses blocks into your own spec.
Related
- http-status-codes.md — 202, 409, 412, 428, 429 selection rules
- rest-naming.md — URL conventions
- api-governance.md — linting, docs, client gen, contract testing
- RFC 9457 — Problem Details for HTTP APIs
- Stripe: Designing APIs with Idempotency — the canonical write-up