feat: improved the Claude Kit as a plugin

This commit is contained in:
duthaho
2026-04-19 14:09:14 +07:00
parent 3103a8da1b
commit d1a6d2a2bc
186 changed files with 771 additions and 1691 deletions
+64
View File
@@ -0,0 +1,64 @@
---
name: databases
description: >
Use when working with PostgreSQL, MongoDB, or Redis — including schema design, queries, indexing, migrations, connection pooling, caching layers, or any database operation. Also activate for keywords like SQL, aggregation pipeline, BSON, ioredis, alembic, prisma migrate, django migrate, EXPLAIN ANALYZE, ORM configuration, or NoSQL data modeling.
---
# Databases
## When to Use
- PostgreSQL database operations, SQL query optimization, schema design
- JSONB document storage, full-text search, window functions, CTEs
- MongoDB document modeling, aggregation pipelines, semi-structured data
- Redis caching, session storage, rate limiting, pub/sub, job queues, distributed locks
- Database migrations — adding/modifying tables, columns, indexes, constraints
- Resolving migration conflicts, rolling back failed migrations
## When NOT to Use
- Simple key-value caching within a single process — use `functools.lru_cache` or `Map`
- File-based storage that doesn't need a database engine
- Static data or configuration that belongs in environment variables
---
## Quick Reference
| Topic | Reference | Key tools |
|-------|-----------|-----------|
| PostgreSQL | `references/postgresql.md` | SQL, SQLAlchemy, Prisma, EXPLAIN ANALYZE, pg_stat_statements |
| MongoDB | `references/mongodb.md` | Aggregation, Mongoose, Motor, document schemas, ESR indexing |
| Redis | `references/redis.md` | Caching, pub/sub, ioredis, BullMQ, session storage, distributed locks |
| Migrations | `references/migrations.md` | Alembic, Prisma Migrate, Django migrations, rollback strategies |
---
## Best Practices
1. **Use parameterized queries everywhere.** Never concatenate user input into SQL strings.
2. **Design schema around access patterns.** Ask "how will I read this?" before "how does this relate?" Embed data fetched together (MongoDB); normalize data accessed independently (PostgreSQL).
3. **Index foreign keys and query fields.** PostgreSQL doesn't auto-index FK child columns. MongoDB queries without indexes trigger full collection scans.
4. **Use appropriate consistency levels.** `TIMESTAMPTZ` over `TIMESTAMP` (PostgreSQL). `w: "majority"` for durable writes (MongoDB). TTLs on every Redis cache key.
5. **Monitor query performance.** `pg_stat_statements` (PostgreSQL), `db.setProfilingLevel(1)` (MongoDB), connection pool metrics (all).
6. **Use bulk/batch operations.** `bulkWrite` (MongoDB), `COPY` (PostgreSQL), pipelines (Redis) for high-throughput writes.
7. **Never edit deployed migrations.** Create a new migration instead of modifying one already applied.
8. **Test rollback paths.** Always verify your downgrade/rollback strategy before deploying schema changes.
## Common Pitfalls
1. **N+1 queries from ORM lazy loading.** Use eager loading (`joinedload`, `select_related`, `$lookup` with caution).
2. **Table locks during migrations.** Use `CREATE INDEX CONCURRENTLY` (PostgreSQL). Batch backfills for large tables.
3. **Unbounded growth.** Dead tuples from UPDATE-heavy workloads (PostgreSQL). Arrays exceeding 16MB document limit (MongoDB). Redis keys without TTLs.
4. **OFFSET pagination on large datasets.** Use keyset/cursor pagination instead.
5. **Connection exhaustion.** Use connection pools (PgBouncer, application-level pools). Never open per-request connections.
6. **Cache stampede.** When a popular Redis key expires, many requests hit the DB simultaneously. Use distributed locks or stale-while-revalidate.
7. **Running `migrate reset` in production.** This drops all data.
---
## Related Skills
- `backend-frameworks` — Framework-specific ORM integration
- `error-handling` — Database error handling patterns
- `logging` — Query logging and slow query detection
+312
View File
@@ -0,0 +1,312 @@
# Databases — Migration Patterns
# Database Migrations
## When to Use
- Adding or modifying database tables/columns
- Creating indexes or constraints
- Running migrations in development, staging, or production
- Resolving migration conflicts in a team
- Rolling back a failed migration
## When NOT to Use
- Query optimization without schema changes — use `postgresql` skill
- Initial database design from scratch — use `postgresql` or `mongodb` skill
- ORM configuration without migrations — use framework-specific skill
---
## Quick Reference
| I need... | Go to |
|-----------|-------|
| Alembic (FastAPI/SQLAlchemy) | SS Alembic below |
| Prisma (NestJS/Express) | SS Prisma below |
| Django migrations | SS Django below |
| Safe production patterns | SS Production Safety below |
| Rollback strategies | SS Rollbacks below |
---
## Alembic (Python / SQLAlchemy)
### Setup
```bash
pip install alembic
alembic init migrations
```
```python
# migrations/env.py — configure target metadata
from src.models import Base
target_metadata = Base.metadata
```
### Create a migration
```bash
# Auto-generate from model changes
alembic revision --autogenerate -m "add orders table"
# Manual migration (for data migrations or complex changes)
alembic revision -m "backfill order status"
```
### Migration file
```python
# migrations/versions/003_add_orders_table.py
"""add orders table"""
from alembic import op
import sqlalchemy as sa
revision = '003'
down_revision = '002'
def upgrade() -> None:
op.create_table(
'orders',
sa.Column('id', sa.UUID(), primary_key=True, server_default=sa.text('gen_random_uuid()')),
sa.Column('user_id', sa.UUID(), sa.ForeignKey('users.id', ondelete='CASCADE'), nullable=False),
sa.Column('total', sa.Numeric(10, 2), nullable=False),
sa.Column('status', sa.String(20), nullable=False, server_default='pending'),
sa.Column('created_at', sa.DateTime(timezone=True), server_default=sa.func.now()),
)
op.create_index('ix_orders_user_id', 'orders', ['user_id'])
op.create_index('ix_orders_created_at', 'orders', ['created_at'])
def downgrade() -> None:
op.drop_table('orders')
```
### Run migrations
```bash
# Apply all pending
alembic upgrade head
# Apply one step
alembic upgrade +1
# Check current state
alembic current
# Check for pending migrations
alembic check
# View migration history
alembic history --verbose
```
---
## Prisma (TypeScript / NestJS / Express)
### Create a migration
```bash
# Generate migration from schema changes
npx prisma migrate dev --name add_orders_table
# Apply in production (no interactive prompts)
npx prisma migrate deploy
# Check status
npx prisma migrate status
```
### Schema change
```prisma
// prisma/schema.prisma
model Order {
id String @id @default(uuid())
userId String
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
total Decimal @db.Decimal(10, 2)
status String @default("pending")
createdAt DateTime @default(now())
@@index([userId])
@@index([createdAt])
}
```
### Generated migration SQL
```sql
-- prisma/migrations/20260417_add_orders_table/migration.sql
CREATE TABLE "Order" (
"id" TEXT NOT NULL DEFAULT gen_random_uuid(),
"userId" TEXT NOT NULL,
"total" DECIMAL(10,2) NOT NULL,
"status" TEXT NOT NULL DEFAULT 'pending',
"createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT "Order_pkey" PRIMARY KEY ("id")
);
CREATE INDEX "Order_userId_idx" ON "Order"("userId");
CREATE INDEX "Order_createdAt_idx" ON "Order"("createdAt");
ALTER TABLE "Order" ADD CONSTRAINT "Order_userId_fkey"
FOREIGN KEY ("userId") REFERENCES "User"("id") ON DELETE CASCADE;
```
---
## Django
### Create and apply
```bash
# Auto-generate from model changes
python manage.py makemigrations app_name
# Apply
python manage.py migrate
# Check for pending
python manage.py showmigrations
# SQL preview (don't execute)
python manage.py sqlmigrate app_name 0003
```
### Data migration
```python
# app/migrations/0004_backfill_order_status.py
from django.db import migrations
def backfill_status(apps, schema_editor):
Order = apps.get_model('orders', 'Order')
Order.objects.filter(status='').update(status='pending')
class Migration(migrations.Migration):
dependencies = [('orders', '0003_add_orders')]
operations = [migrations.RunPython(backfill_status, migrations.RunPython.noop)]
```
---
## Production Safety
### Golden rules
1. **Never drop columns in the same deploy as removing code references.** Remove code first, deploy, then drop column in next migration.
2. **Add columns as nullable or with defaults.** `NOT NULL` without a default locks the table during backfill on large tables.
3. **Create indexes concurrently** (PostgreSQL):
```sql
CREATE INDEX CONCURRENTLY ix_orders_status ON orders(status);
```
4. **Test migrations against a production-size dataset** before deploying.
5. **Always have a rollback plan** — either a `downgrade()` function or a manual SQL script.
### Safe column addition pattern
```python
# Step 1: Add nullable column (fast, no lock)
op.add_column('users', sa.Column('phone', sa.String(20), nullable=True))
# Step 2: Backfill in batches (separate migration or script)
# Don't do UPDATE users SET phone = '...' on millions of rows at once
# Step 3: Add NOT NULL constraint (after backfill confirms all rows filled)
op.alter_column('users', 'phone', nullable=False)
```
### Safe column rename pattern
```
Deploy 1: Add new column, write to both old and new
Deploy 2: Backfill new column from old, read from new
Deploy 3: Stop writing to old column
Deploy 4: Drop old column
```
---
## Rollbacks
### Alembic
```bash
# Rollback one step
alembic downgrade -1
# Rollback to specific revision
alembic downgrade 002
# Rollback to base (dangerous — drops everything)
alembic downgrade base
```
### Prisma
Prisma doesn't have built-in rollback. Options:
- Apply a new migration that reverses the change
- Manually run SQL: `npx prisma db execute --file rollback.sql`
- Restore from database backup
### Django
```bash
# Rollback to specific migration
python manage.py migrate app_name 0002
```
---
## Team Workflow
### Resolving migration conflicts
When two developers create migrations from the same parent:
**Alembic:**
```bash
# Developer A and B both branched from revision 002
# Alembic detects multiple heads
alembic heads # shows 003a and 003b
alembic merge -m "merge migrations" 003a 003b
alembic upgrade head
```
**Prisma:**
```bash
# Reset and re-apply (dev only)
npx prisma migrate reset
# Or resolve manually by editing the migration SQL
```
**Django:**
```bash
# Django auto-detects and asks to merge
python manage.py makemigrations --merge
```
---
## Common Pitfalls
1. **Running `migrate reset` in production.** This drops all data. Only use in development.
2. **Editing already-applied migrations.** Never modify a migration that's been deployed. Create a new migration instead.
3. **Forgetting indexes.** Add indexes for foreign keys and frequently-queried columns in the same migration.
4. **Large table locks.** `ALTER TABLE` with `NOT NULL` or `ADD COLUMN DEFAULT` can lock large tables. Use batched backfills.
5. **Not testing downgrade.** Always test your rollback path before deploying.
6. **Circular foreign keys.** Use `sa.ForeignKey` with `use_alter=True` in Alembic to handle circular deps.
---
## Related Skills
- `postgresql` — Database design, query optimization, indexing strategies
- `fastapi` — SQLAlchemy async patterns with FastAPI
- `nestjs` — Prisma integration with NestJS
- `django` — Django ORM models and migrations
- `docker` — Running migration containers in CI/CD
+576
View File
@@ -0,0 +1,576 @@
# Databases — MongoDB Patterns
# MongoDB
## When to Use
- MongoDB database operations
- Document-based data modeling
- Aggregation pipelines
- Semi-structured or polymorphic data that varies per record
- Rapid prototyping where schema flexibility accelerates iteration
- Event logging, IoT telemetry, or content management systems
## When NOT to Use
- Relational-heavy data models with complex joins and foreign key constraints
- SQL-only projects where the entire stack is built around relational databases
- Simple key-value storage where Redis or a lightweight store is more appropriate
- Financial systems requiring multi-table ACID transactions as the norm
---
## Core Patterns
### 1. Schema Design
The central decision in MongoDB modeling is **embed vs. reference**.
**Decision tree:**
```
Does the child data belong to exactly one parent?
YES --> Is the child array unbounded (could grow to thousands)?
YES --> Reference (separate collection)
NO --> Embed
NO --> Is it a many-to-many relationship?
YES --> Reference (with array of ObjectIds on one or both sides)
NO --> Reference
```
**Embedding pattern -- best for data that is read together:**
```javascript
// User with embedded address and preferences
// Good: one read fetches everything the profile page needs
db.users.insertOne({
email: "user@example.com",
name: "Alice Chen",
address: {
street: "123 Main St",
city: "Portland",
state: "OR",
zip: "97201"
},
preferences: {
theme: "dark",
language: "en",
notifications: { email: true, push: false }
},
createdAt: new Date()
});
```
**Referencing pattern -- best for independent or unbounded data:**
```javascript
// Orders reference the user by ID
// Good: orders grow unboundedly, accessed independently
db.orders.insertOne({
userId: ObjectId("6651a..."),
status: "shipped",
totalCents: 4999,
items: [
{ sku: "WIDGET-001", name: "Blue Widget", qty: 2, priceCents: 1999 },
{ sku: "GADGET-010", name: "Mini Gadget", qty: 1, priceCents: 1001 }
],
placedAt: new Date()
});
```
**Denormalization pattern -- duplicate data to avoid frequent lookups:**
```javascript
// Store author name directly on the post (denormalized from users)
// Trade-off: faster reads, but updates to user name require updating all posts
db.posts.insertOne({
title: "Getting Started with MongoDB",
body: "...",
author: {
_id: ObjectId("6651a..."),
name: "Alice Chen" // denormalized -- must be updated if name changes
},
tags: ["mongodb", "tutorial"],
publishedAt: new Date()
});
```
**Polymorphic pattern -- different shapes in one collection:**
```javascript
// Events collection stores different event types
db.events.insertMany([
{
type: "page_view",
userId: ObjectId("6651a..."),
url: "/products/widget",
timestamp: new Date()
},
{
type: "purchase",
userId: ObjectId("6651a..."),
orderId: ObjectId("6651b..."),
totalCents: 4999,
timestamp: new Date()
}
]);
// Use a discriminator field (type) and query by it
```
**Schema validation -- enforce structure at the database level:**
```javascript
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["email", "name", "createdAt"],
properties: {
email: {
bsonType: "string",
pattern: "^.+@.+\\..+$",
description: "Must be a valid email"
},
name: {
bsonType: "string",
minLength: 1
},
role: {
enum: ["admin", "editor", "viewer"],
description: "Must be a valid role"
},
createdAt: { bsonType: "date" }
}
}
},
validationLevel: "strict",
validationAction: "error"
});
```
---
### 2. Aggregation Pipeline
Build complex data transformations as a sequence of stages.
```javascript
// Revenue report: total and average spend per user, last 30 days
db.orders.aggregate([
// Stage 1: filter to recent delivered orders
{ $match: {
status: "delivered",
placedAt: { $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) }
}},
// Stage 2: group by user
{ $group: {
_id: "$userId",
totalSpent: { $sum: "$totalCents" },
orderCount: { $sum: 1 },
avgOrderValue: { $avg: "$totalCents" }
}},
// Stage 3: sort by spend
{ $sort: { totalSpent: -1 } },
// Stage 4: limit to top 10
{ $limit: 10 },
// Stage 5: join user details
{ $lookup: {
from: "users",
localField: "_id",
foreignField: "_id",
as: "user"
}},
// Stage 6: flatten the joined array
{ $unwind: "$user" },
// Stage 7: reshape output
{ $project: {
_id: 0,
userName: "$user.name",
email: "$user.email",
totalSpent: 1,
orderCount: 1,
avgOrderValue: { $round: ["$avgOrderValue", 0] }
}}
]);
```
**$unwind -- flatten arrays into individual documents:**
```javascript
// Expand order items to analyze product-level metrics
db.orders.aggregate([
{ $unwind: "$items" },
{ $group: {
_id: "$items.sku",
totalQty: { $sum: "$items.qty" },
totalRevenue: { $sum: { $multiply: ["$items.qty", "$items.priceCents"] } }
}},
{ $sort: { totalRevenue: -1 } }
]);
```
**$lookup with pipeline -- filtered/correlated joins:**
```javascript
// For each user, get their 3 most recent orders
db.users.aggregate([
{ $lookup: {
from: "orders",
let: { uid: "$_id" },
pipeline: [
{ $match: { $expr: { $eq: ["$userId", "$$uid"] } } },
{ $sort: { placedAt: -1 } },
{ $limit: 3 },
{ $project: { status: 1, totalCents: 1, placedAt: 1 } }
],
as: "recentOrders"
}}
]);
```
**$facet -- run multiple aggregations in parallel:**
```javascript
// Dashboard: get summary stats and top products in one query
db.orders.aggregate([
{ $match: { status: "delivered" } },
{ $facet: {
summary: [
{ $group: {
_id: null,
totalRevenue: { $sum: "$totalCents" },
totalOrders: { $sum: 1 }
}}
],
topProducts: [
{ $unwind: "$items" },
{ $group: { _id: "$items.sku", sold: { $sum: "$items.qty" } } },
{ $sort: { sold: -1 } },
{ $limit: 5 }
],
monthlyTrend: [
{ $group: {
_id: { $dateToString: { format: "%Y-%m", date: "$placedAt" } },
revenue: { $sum: "$totalCents" }
}},
{ $sort: { _id: 1 } }
]
}}
]);
```
---
### 3. Index Strategies
```javascript
// Single field index -- most common
db.users.createIndex({ email: 1 }, { unique: true });
// Compound index -- order matters, follows the ESR rule:
// Equality fields first, Sort fields next, Range fields last
db.orders.createIndex({ status: 1, placedAt: -1 });
// Supports: find({status: "pending"}).sort({placedAt: -1})
// Also supports: find({status: "pending"}) alone (prefix)
// Multikey index -- automatically indexes each array element
db.posts.createIndex({ tags: 1 });
// Supports: find({ tags: "mongodb" })
// Text index -- basic full-text search
db.posts.createIndex(
{ title: "text", body: "text" },
{ weights: { title: 10, body: 1 }, name: "posts_text_search" }
);
// Usage:
db.posts.find(
{ $text: { $search: "mongodb aggregation" } },
{ score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } });
// TTL index -- auto-delete documents after expiry
db.sessions.createIndex(
{ expiresAt: 1 },
{ expireAfterSeconds: 0 } // delete when expiresAt is in the past
);
// Documents must have a Date field; they are removed by a background task ~every 60s
// Partial index -- only index documents matching a filter
db.orders.createIndex(
{ placedAt: -1 },
{ partialFilterExpression: { status: "pending" } }
);
// Smaller index; only used when the query includes the filter condition
// Wildcard index -- for querying arbitrary keys in a sub-document
db.products.createIndex({ "attributes.$**": 1 });
// Supports: find({ "attributes.color": "red" }) without knowing keys in advance
// Collation -- case-insensitive sorting and matching
db.users.createIndex(
{ name: 1 },
{ collation: { locale: "en", strength: 2 } }
);
```
**The ESR rule for compound indexes:** order fields by **E**quality, **S**ort, **R**ange. This produces the most efficient index scans.
```javascript
// Query: find active orders for a user, sorted by date, in a date range
// Equality: userId, status
// Sort: placedAt
// Range: placedAt (but sort and range on same field -- sort wins)
db.orders.createIndex({ userId: 1, status: 1, placedAt: -1 });
```
---
### 4. Transactions
Multi-document transactions work across collections (requires replica set or sharded cluster).
```javascript
const session = client.startSession();
try {
session.startTransaction({
readConcern: { level: "snapshot" },
writeConcern: { w: "majority" },
readPreference: "primary"
});
const accounts = client.db("bank").collection("accounts");
// Transfer $50 from account A to account B
const fromAccount = await accounts.findOne(
{ _id: "account-A" },
{ session }
);
if (fromAccount.balanceCents < 5000) {
await session.abortTransaction();
throw new Error("Insufficient funds");
}
await accounts.updateOne(
{ _id: "account-A" },
{ $inc: { balanceCents: -5000 } },
{ session }
);
await accounts.updateOne(
{ _id: "account-B" },
{ $inc: { balanceCents: 5000 } },
{ session }
);
// Record the transfer in a separate collection -- still in the same tx
await client.db("bank").collection("transfers").insertOne({
from: "account-A",
to: "account-B",
amountCents: 5000,
timestamp: new Date()
}, { session });
await session.commitTransaction();
} catch (error) {
await session.abortTransaction();
throw error;
} finally {
await session.endSession();
}
```
**Guidelines:**
- Keep transactions short -- they hold locks and consume resources
- Design your schema to minimize the need for multi-document transactions
- Transactions have a default 60-second timeout (`maxTimeMS`)
- Retryable writes (`retryWrites=true` in connection string) handle transient errors automatically
---
### 5. Change Streams
Watch for real-time changes to collections, databases, or the entire deployment.
```javascript
// Watch a single collection for inserts and updates
const pipeline = [
{ $match: {
operationType: { $in: ["insert", "update"] },
"fullDocument.status": "urgent"
}}
];
const changeStream = db.collection("tickets").watch(pipeline, {
fullDocument: "updateLookup" // include the full document on updates
});
changeStream.on("change", (change) => {
console.log("Change detected:", change.operationType);
console.log("Document:", change.fullDocument);
console.log("Resume token:", change.resumeToken);
// Process the change (e.g., send notification, update cache)
notifyTeam(change.fullDocument);
});
// Handle errors and resume from last known position
changeStream.on("error", (error) => {
console.error("Change stream error:", error);
// Reconnect using the stored resume token
});
```
**Resumable pattern for production:**
```javascript
let resumeToken = await loadResumeTokenFromStorage();
async function watchWithResume(collection) {
const options = { fullDocument: "updateLookup" };
if (resumeToken) {
options.resumeAfter = resumeToken;
}
const stream = collection.watch([], options);
stream.on("change", async (change) => {
// Process change
await handleChange(change);
// Persist resume token so we can recover after restart
resumeToken = change._id;
await saveResumeTokenToStorage(resumeToken);
});
stream.on("error", async () => {
// Wait and reconnect
await new Promise(r => setTimeout(r, 5000));
watchWithResume(collection);
});
}
```
**Use cases:** real-time dashboards, cache invalidation, event-driven architectures, syncing data to search indexes (e.g., Elasticsearch).
---
### 6. Performance
#### Reading explain() output
```javascript
// Run explain to see the query plan
db.orders.find({
userId: ObjectId("6651a..."),
status: "pending"
}).sort({ placedAt: -1 }).explain("executionStats");
```
**Key fields in executionStats:**
| Field | What to look for |
|-------|-----------------|
| `winningPlan.stage` | `IXSCAN` good, `COLLSCAN` bad (full collection scan) |
| `totalKeysExamined` | Should be close to `nReturned` (no wasted index scans) |
| `totalDocsExamined` | Should be close to `nReturned` (no wasted document reads) |
| `executionTimeMillis` | Overall query time |
| `rejectedPlans` | Shows alternatives the optimizer considered |
**Covered queries -- answered entirely from the index:**
```javascript
// Create an index that covers the query
db.orders.createIndex({ userId: 1, status: 1, totalCents: 1 });
// This query only needs fields in the index -- no document fetch
db.orders.find(
{ userId: ObjectId("6651a..."), status: "delivered" },
{ _id: 0, totalCents: 1 } // projection must exclude _id and only include indexed fields
);
// explain() will show: "totalDocsExamined": 0
```
**Projection optimization -- fetch only what you need:**
```javascript
// BAD: fetches entire document including large body field
const posts = await db.posts.find({ author: userId }).toArray();
// GOOD: only fetch fields needed for the list view
const posts = await db.posts.find(
{ author: userId },
{ projection: { title: 1, publishedAt: 1, tags: 1 } }
).toArray();
```
**Bulk operations for write-heavy workloads:**
```javascript
const bulk = db.products.initializeUnorderedBulkOp();
for (const update of priceUpdates) {
bulk.find({ sku: update.sku })
.updateOne({ $set: { priceCents: update.newPrice, updatedAt: new Date() } });
}
const result = await bulk.execute();
console.log(`Modified: ${result.nModified}, Errors: ${result.getWriteErrorCount()}`);
```
---
## Best Practices
1. **Design schema around query patterns, not data relationships.** Ask "how will I read this data?" before "how does this data relate?" Embed data that is always fetched together; reference data accessed independently.
2. **Use the ESR rule for compound indexes.** Order index fields by Equality, Sort, Range. This maximizes the index's usefulness and minimizes keys examined.
3. **Set read/write concerns appropriately.** Use `w: "majority"` and `readConcern: "majority"` for data that must survive failovers. Use `w: 1` for non-critical writes where speed matters more than durability.
4. **Use projection to limit returned fields.** Transferring large documents over the network when you only need two fields wastes bandwidth and memory. Always project.
5. **Avoid unbounded array growth.** An embedded array that can grow to thousands of elements bloats the document (16 MB max) and degrades performance. Move to a separate collection with a reference when the array exceeds ~100 elements.
6. **Use bulk operations for batch writes.** Individual `insertOne` or `updateOne` calls in a loop are slow. Batch them with `bulkWrite` or `initializeUnorderedBulkOp` for 10-50x throughput improvement.
7. **Enable retryable writes.** Add `retryWrites=true` to your connection string. This handles transient network errors and primary elections automatically without application-level retry logic.
8. **Monitor with database profiler and serverStatus.** Use `db.setProfilingLevel(1, { slowms: 100 })` to log slow queries. Check `db.serverStatus().opcounters` and `db.serverStatus().connections` for overall health.
## Common Pitfalls
1. **Treating MongoDB like a relational database.** Normalizing everything into separate collections and using `$lookup` for every query defeats the purpose. If you need heavy joins, PostgreSQL is likely a better fit. Design for embedding first.
2. **Missing indexes on query fields.** Every `find()`, `$match`, and `sort()` should be backed by an index. Use `db.collection.getIndexes()` and `explain()` to verify. A `COLLSCAN` on a large collection is almost always a bug.
3. **Ignoring the 16 MB document size limit.** Embedding unbounded arrays (comments, logs, events) will eventually hit this wall, crashing writes. Use the bucket pattern (fixed-size sub-documents) or reference a separate collection.
4. **Not using readPreference for read-heavy workloads.** By default all reads go to the primary. For analytics or non-critical reads, use `readPreference: "secondaryPreferred"` to distribute load across replicas.
5. **Forgetting that updates replace matched array elements, not all of them.** Using `$set` on a matched array element with positional `$` only updates the first match. Use `$[]` for all elements or `$[<identifier>]` with `arrayFilters` for conditional updates:
```javascript
// Update price for a specific item in all orders
db.orders.updateMany(
{ "items.sku": "WIDGET-001" },
{ $set: { "items.$[item].priceCents": 2499 } },
{ arrayFilters: [{ "item.sku": "WIDGET-001" }] }
);
```
6. **Running aggregation pipelines without early $match.** Always filter as early as possible in the pipeline. A `$group` or `$unwind` before `$match` processes the entire collection unnecessarily. Put `$match` first to leverage indexes and reduce documents flowing through subsequent stages.
## Related Skills
- `postgresql` - Relational database patterns for structured data with complex relationships
- `caching` - Caching strategies to reduce database load
- `logging` - Logging patterns for query debugging and monitoring
+609
View File
@@ -0,0 +1,609 @@
# Databases — PostgreSQL Patterns
# PostgreSQL
## When to Use
- PostgreSQL database operations
- SQL query optimization
- Schema design and migrations
- JSONB document storage within a relational model
- Full-text search without a dedicated search engine
- Complex analytical queries with window functions and CTEs
## When NOT to Use
- NoSQL-only projects where no relational database is involved
- In-memory databases like Redis or SQLite used purely for caching or ephemeral storage
- File-based storage scenarios that do not require a database engine
---
## Core Patterns
### 1. Schema Design
Design tables with explicit constraints, proper types, and clear relationships.
```sql
-- Enums for constrained value sets
CREATE TYPE user_role AS ENUM ('admin', 'editor', 'viewer');
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered', 'cancelled');
-- Composite types for reusable structures
CREATE TYPE address AS (
street TEXT,
city TEXT,
state TEXT,
zip VARCHAR(10)
);
-- Users table with constraints
CREATE TABLE users (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
name TEXT NOT NULL CHECK (char_length(name) >= 1),
role user_role NOT NULL DEFAULT 'viewer',
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Organizations with self-referencing hierarchy
CREATE TABLE organizations (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name TEXT NOT NULL,
parent_id BIGINT REFERENCES organizations(id) ON DELETE SET NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Membership join table with composite primary key
CREATE TABLE org_memberships (
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
org_id BIGINT NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
role user_role NOT NULL DEFAULT 'viewer',
joined_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (user_id, org_id)
);
-- Orders with foreign keys, check constraints, and enum status
CREATE TABLE orders (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
status order_status NOT NULL DEFAULT 'pending',
total_cents BIGINT NOT NULL CHECK (total_cents >= 0),
shipping address,
items JSONB NOT NULL DEFAULT '[]',
placed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Auto-update updated_at with a trigger
CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = now();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_users_updated_at
BEFORE UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION set_updated_at();
```
**Key principles:**
- Use `BIGINT GENERATED ALWAYS AS IDENTITY` over `SERIAL` for new projects
- Use `TIMESTAMPTZ` (not `TIMESTAMP`) to store times with timezone awareness
- Prefer `TEXT` over `VARCHAR(n)` unless a hard length limit is business-critical
- Add `ON DELETE` actions on every foreign key (CASCADE, RESTRICT, or SET NULL)
- Use `CHECK` constraints for business rules that live at the data level
---
### 2. Index Strategy
Choose the right index type based on your query patterns.
**Decision guide:**
| Query Pattern | Index Type | Example |
|---------------|-----------|---------|
| Equality (`=`) and range (`<`, `>`, `BETWEEN`) | B-tree (default) | `WHERE created_at > '2025-01-01'` |
| Array containment (`@>`), JSONB queries | GIN | `WHERE tags @> '{postgres}'` |
| Full-text search (`@@`) | GIN | `WHERE to_tsvector(body) @@ query` |
| Geometry, range overlap | GiST | `WHERE location <-> point '(40.7,-74.0)' < 0.01` |
| Filtered subset of rows | Partial | `WHERE active = true` |
| Index-only scans (no heap lookup) | Covering (INCLUDE) | Frequently selected columns |
```sql
-- B-tree: default, good for equality and range
CREATE INDEX idx_orders_placed_at ON orders(placed_at DESC);
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- GIN: arrays and JSONB containment
CREATE INDEX idx_users_metadata ON users USING GIN (metadata);
CREATE INDEX idx_orders_items ON orders USING GIN (items jsonb_path_ops);
-- GIN: full-text search
ALTER TABLE articles ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'B')
) STORED;
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
-- Full-text search query
SELECT id, title, ts_rank(search_vector, query) AS rank
FROM articles, plainto_tsquery('english', 'database optimization') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
-- GiST: geometry and range types
CREATE INDEX idx_events_duration ON events USING GiST (
tstzrange(starts_at, ends_at)
);
-- Find overlapping events
SELECT * FROM events
WHERE tstzrange(starts_at, ends_at) && tstzrange('2025-06-01', '2025-06-02');
-- Partial index: only index rows you actually query
CREATE INDEX idx_orders_pending ON orders(placed_at)
WHERE status = 'pending';
-- Covering index: avoids heap lookup for common queries
CREATE INDEX idx_users_email_covering ON users(email)
INCLUDE (name, role);
-- This query can now be answered entirely from the index
SELECT name, role FROM users WHERE email = 'user@example.com';
```
**When to add an index:** Run `EXPLAIN ANALYZE` first. Add an index when you see sequential scans on large tables with selective WHERE clauses. Do not index columns with very low cardinality (e.g., a boolean on a small table) unless combined with other columns.
---
### 3. Query Optimization
#### Reading EXPLAIN ANALYZE
```sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.placed_at > now() - INTERVAL '30 days'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;
```
**What to look for in the output:**
- **Seq Scan on large tables** -- add an index or rewrite the WHERE clause
- **Nested Loop with high row counts** -- consider a Hash Join (may need more `work_mem`)
- **actual rows far exceeding estimated rows** -- run `ANALYZE tablename` to update statistics
- **Buffers: shared read** large numbers -- data not cached, check `shared_buffers` sizing
- **Sort Method: external merge** -- increase `work_mem` for this query
#### Common Query Rewrites
```sql
-- BAD: correlated subquery runs once per row
SELECT u.name,
(SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) AS order_count
FROM users u;
-- GOOD: single pass with JOIN + GROUP BY
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;
-- BAD: OR on different columns defeats index usage
SELECT * FROM orders WHERE user_id = 5 OR status = 'pending';
-- GOOD: UNION ALL lets each branch use its own index
SELECT * FROM orders WHERE user_id = 5
UNION ALL
SELECT * FROM orders WHERE status = 'pending' AND user_id != 5;
-- BAD: function call on indexed column prevents index use
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
-- GOOD: expression index or use citext
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- or better: define email as CITEXT type
-- Avoiding N+1: fetch users and their latest order in one query
SELECT DISTINCT ON (u.id)
u.id, u.name, o.id AS latest_order_id, o.total_cents, o.placed_at
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
ORDER BY u.id, o.placed_at DESC;
```
---
### 4. Migrations
Follow the up/down pattern and plan for zero-downtime deployments.
```sql
-- ============================================
-- Migration: 20250601_001_add_user_preferences
-- ============================================
-- UP
ALTER TABLE users ADD COLUMN preferences JSONB DEFAULT '{}';
-- Create index CONCURRENTLY to avoid locking the table
CREATE INDEX CONCURRENTLY idx_users_preferences
ON users USING GIN (preferences);
-- DOWN
DROP INDEX IF EXISTS idx_users_preferences;
ALTER TABLE users DROP COLUMN IF EXISTS preferences;
```
**Safe vs unsafe operations:**
| Operation | Safe? | Notes |
|-----------|-------|-------|
| ADD COLUMN (nullable or with volatile default) | Yes | Instant in PG 11+ with non-volatile default too |
| ADD COLUMN NOT NULL without default | No | Fails if rows exist; add nullable first, backfill, then set NOT NULL |
| DROP COLUMN | Mostly | Quick, but ORM queries may break if they SELECT * |
| RENAME COLUMN | Dangerous | Breaks all queries referencing old name; use a transition period |
| ADD INDEX | Safe with CONCURRENTLY | Without CONCURRENTLY, locks writes for duration |
| ADD CONSTRAINT (CHECK/FK) | Careful | Use NOT VALID then VALIDATE CONSTRAINT in two steps |
| Change column type | Dangerous | Rewrites entire table; use a new column + migration instead |
```sql
-- Zero-downtime: add NOT NULL constraint safely
-- Step 1: add column as nullable
ALTER TABLE users ADD COLUMN phone TEXT;
-- Step 2: backfill in batches
UPDATE users SET phone = '' WHERE phone IS NULL AND id BETWEEN 1 AND 10000;
UPDATE users SET phone = '' WHERE phone IS NULL AND id BETWEEN 10001 AND 20000;
-- ... continue in batches
-- Step 3: add constraint without full table lock
ALTER TABLE users ADD CONSTRAINT users_phone_not_null
CHECK (phone IS NOT NULL) NOT VALID;
-- Step 4: validate (scans table but allows concurrent writes)
ALTER TABLE users VALIDATE CONSTRAINT users_phone_not_null;
-- Step 5: optionally convert to proper NOT NULL
ALTER TABLE users ALTER COLUMN phone SET NOT NULL;
ALTER TABLE users DROP CONSTRAINT users_phone_not_null;
```
---
### 5. JSON/JSONB
Use JSONB for semi-structured data that lives alongside relational columns.
**When to use JSONB:**
- User preferences, settings, or metadata with varying keys
- API response caching or event payloads
- Flexible attributes that differ per row
**When NOT to use JSONB:**
- Data you regularly JOIN on or use in WHERE clauses across tables -- normalize it
- Data that has a fixed, well-known schema -- use proper columns
```sql
-- Querying JSONB: operators
-- -> returns JSONB element (keeps type)
-- ->> returns TEXT value
-- @> containment (left contains right)
-- ? key exists
-- Get a nested value
SELECT
metadata->>'department' AS department,
metadata->'settings'->>'theme' AS theme
FROM users
WHERE metadata @> '{"role": "admin"}';
-- Check if a key exists
SELECT * FROM users WHERE metadata ? 'avatar_url';
-- Query inside JSONB arrays
SELECT * FROM orders
WHERE items @> '[{"sku": "WIDGET-001"}]';
-- Update a nested JSONB field
UPDATE users
SET metadata = jsonb_set(metadata, '{settings,notifications}', '"email"')
WHERE id = 42;
-- Remove a key
UPDATE users
SET metadata = metadata - 'deprecated_field'
WHERE metadata ? 'deprecated_field';
-- Aggregate JSONB: expand array elements into rows
SELECT o.id, item->>'sku' AS sku, (item->>'qty')::int AS qty
FROM orders o, jsonb_array_elements(o.items) AS item
WHERE o.status = 'pending';
-- Index strategies for JSONB
-- General containment queries: GIN with jsonb_ops (default)
CREATE INDEX idx_users_metadata_gin ON users USING GIN (metadata);
-- Containment-only queries (smaller, faster index): jsonb_path_ops
CREATE INDEX idx_orders_items_path ON orders USING GIN (items jsonb_path_ops);
-- Specific key lookups: expression index on extracted value
CREATE INDEX idx_users_department ON users ((metadata->>'department'));
```
---
### 6. CTEs and Window Functions
#### Common Table Expressions (CTEs)
```sql
-- Readable multi-step query with CTEs
WITH monthly_revenue AS (
SELECT
date_trunc('month', placed_at) AS month,
SUM(total_cents) AS revenue_cents
FROM orders
WHERE status = 'delivered'
GROUP BY 1
),
revenue_with_growth AS (
SELECT
month,
revenue_cents,
LAG(revenue_cents) OVER (ORDER BY month) AS prev_month,
ROUND(
100.0 * (revenue_cents - LAG(revenue_cents) OVER (ORDER BY month))
/ NULLIF(LAG(revenue_cents) OVER (ORDER BY month), 0),
1
) AS growth_pct
FROM monthly_revenue
)
SELECT * FROM revenue_with_growth ORDER BY month DESC;
-- Recursive CTE: org hierarchy tree
WITH RECURSIVE org_tree AS (
-- Base case: top-level orgs
SELECT id, name, parent_id, 0 AS depth, name::TEXT AS path
FROM organizations
WHERE parent_id IS NULL
UNION ALL
-- Recursive step
SELECT o.id, o.name, o.parent_id, t.depth + 1, t.path || ' > ' || o.name
FROM organizations o
JOIN org_tree t ON o.parent_id = t.id
)
SELECT * FROM org_tree ORDER BY path;
```
#### Window Functions
```sql
-- ROW_NUMBER: assign rank within a partition
SELECT
user_id,
id AS order_id,
total_cents,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY placed_at DESC) AS rn
FROM orders;
-- Get each user's most recent order
SELECT * FROM (
SELECT
o.*,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY placed_at DESC) AS rn
FROM orders o
) sub WHERE rn = 1;
-- LAG/LEAD: compare with previous/next row
SELECT
placed_at::date AS order_date,
total_cents,
LAG(total_cents) OVER (ORDER BY placed_at) AS prev_order_total,
total_cents - LAG(total_cents) OVER (ORDER BY placed_at) AS diff
FROM orders
WHERE user_id = 42;
-- Running total
SELECT
placed_at::date AS order_date,
total_cents,
SUM(total_cents) OVER (
ORDER BY placed_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM orders
WHERE user_id = 42;
-- NTILE: divide rows into equal buckets (e.g., quartiles)
SELECT
user_id,
SUM(total_cents) AS lifetime_spend,
NTILE(4) OVER (ORDER BY SUM(total_cents) DESC) AS spend_quartile
FROM orders
GROUP BY user_id;
```
---
### 7. Transaction Isolation
PostgreSQL supports four isolation levels. The two most commonly used:
| Level | Dirty Read | Non-Repeatable Read | Phantom Read | Use Case |
|-------|-----------|-------------------|-------------|----------|
| READ COMMITTED (default) | No | Possible | Possible | Most OLTP workloads |
| REPEATABLE READ | No | No | No (in PG) | Reports, consistent snapshots |
| SERIALIZABLE | No | No | No | Financial transactions, inventory |
```sql
-- Default: READ COMMITTED
-- Each statement sees the latest committed data
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
-- SERIALIZABLE: full isolation, detects write conflicts
BEGIN ISOLATION LEVEL SERIALIZABLE;
-- Read current inventory
SELECT quantity FROM inventory WHERE sku = 'WIDGET-001';
-- Decrement if sufficient (PG will abort if concurrent tx conflicts)
UPDATE inventory SET quantity = quantity - 1 WHERE sku = 'WIDGET-001';
COMMIT;
-- If another SERIALIZABLE tx modified the same row, one will get:
-- ERROR: could not serialize access due to concurrent update
-- Your application must retry on serialization failure (SQLSTATE 40001)
-- Advisory locks for application-level coordination
SELECT pg_advisory_xact_lock(hashtext('process-user-' || '42'));
-- Lock is held until transaction ends; no table-level contention
```
**Guidelines:**
- Use READ COMMITTED for general CRUD operations
- Use SERIALIZABLE when correctness requires that concurrent transactions behave as if run sequentially (e.g., balance transfers, seat reservations)
- Always implement retry logic for serialization failures
- Keep transactions as short as possible to reduce contention
---
### 8. Connection Pooling
Direct PostgreSQL connections are expensive (~1-10 MB RAM each). Use a pooler.
**PgBouncer configuration (pgbouncer.ini):**
```ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
; Pool mode: transaction is best for most web apps
pool_mode = transaction
; Sizing: start conservative, tune with monitoring
default_pool_size = 20
max_client_conn = 200
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
; Timeouts
server_idle_timeout = 300
client_idle_timeout = 60
query_timeout = 30
```
**Pool sizing formula:**
```
optimal_pool_size = ((2 * cpu_cores) + effective_disk_spindles)
```
For a 4-core SSD server: `(2 * 4) + 1 = 9` connections is a good starting point. More connections does not mean more throughput -- too many causes contention.
**Pool modes:**
| Mode | Description | Caveats |
|------|-------------|---------|
| `transaction` | Connection returned after each transaction | Cannot use session-level features (LISTEN/NOTIFY, prepared statements, temp tables) |
| `session` | Connection held for entire client session | Fewer pooling benefits; use only when session features needed |
| `statement` | Connection returned after each statement | No multi-statement transactions; rarely used |
**Application-level pooling (Python example with asyncpg):**
```python
import asyncpg
pool = await asyncpg.create_pool(
dsn="postgresql://user:pass@localhost:6432/myapp",
min_size=5,
max_size=20,
max_inactive_connection_lifetime=300,
command_timeout=30,
)
async with pool.acquire() as conn:
rows = await conn.fetch("SELECT * FROM users WHERE active = true")
```
---
## Best Practices
1. **Use parameterized queries everywhere.** Never concatenate user input into SQL strings. ORMs and query builders handle this, but verify in raw SQL contexts.
2. **Run ANALYZE after bulk data changes.** The query planner relies on statistics. After large imports or deletes, run `ANALYZE tablename` to update them.
3. **Prefer BIGINT for primary keys.** INTEGER (max ~2.1 billion) can be exhausted sooner than expected in high-write systems. BIGINT costs 4 extra bytes per row but avoids a painful migration later.
4. **Store money as integers (cents).** Floating-point arithmetic causes rounding errors. Use `BIGINT` for cents or `NUMERIC(19,4)` if sub-cent precision is needed.
5. **Add indexes for foreign keys.** PostgreSQL does not automatically index the child side of a foreign key. Without it, DELETE on the parent table triggers a sequential scan on the child.
6. **Use TIMESTAMPTZ, not TIMESTAMP.** `TIMESTAMP WITHOUT TIME ZONE` silently drops timezone info. Always use `TIMESTAMPTZ` and let the application control display timezone.
7. **Set statement_timeout for web requests.** Prevent runaway queries from holding connections: `SET statement_timeout = '5s';` at session start, or configure per-role in PostgreSQL.
8. **Monitor with pg_stat_statements.** Enable this extension to track query performance over time. The top queries by `total_exec_time` are your optimization targets.
```sql
-- Find slowest queries
SELECT
calls,
round(total_exec_time::numeric, 1) AS total_ms,
round(mean_exec_time::numeric, 1) AS mean_ms,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
```
## Common Pitfalls
1. **N+1 queries from ORM lazy loading.** Loading a list of users and then accessing `user.orders` in a loop generates one query per user. Use eager loading (`joinedload` in SQLAlchemy, `select_related` in Django) or batch the query with a JOIN.
2. **Locking the table during migrations.** `ALTER TABLE ... ADD COLUMN NOT NULL DEFAULT 'x'` is safe in PG 11+, but `CREATE INDEX` without `CONCURRENTLY` locks writes. Always use `CREATE INDEX CONCURRENTLY` in production migrations.
3. **Bloated tables from UPDATE-heavy workloads.** PostgreSQL MVCC creates dead tuples on every UPDATE. If autovacuum cannot keep up, table size and query times grow. Monitor `pg_stat_user_tables.n_dead_tup` and tune autovacuum settings for hot tables.
4. **Using OFFSET for pagination on large datasets.** `OFFSET 100000` forces PG to scan and discard 100,000 rows. Use keyset pagination instead:
```sql
-- BAD: slow for deep pages
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;
-- GOOD: keyset pagination
SELECT * FROM orders WHERE id > 100000 ORDER BY id LIMIT 20;
```
5. **Ignoring connection limits.** Each PostgreSQL connection consumes RAM. Opening hundreds of direct connections (e.g., one per serverless function invocation) will exhaust `max_connections` and crash the server. Always use PgBouncer or an application-level pool.
6. **Storing large blobs in the database.** Files over a few KB should go in object storage (S3, R2). Store the URL/key in PostgreSQL. Large `bytea` or `TEXT` columns bloat the table, slow backups, and waste shared_buffers cache.
## Related Skills
- `mongodb` - Document-based database patterns for non-relational data
- `caching` - Caching strategies to reduce database load
- `logging` - Logging patterns for query debugging and monitoring
+279
View File
@@ -0,0 +1,279 @@
# Databases — Redis Patterns
# Redis
## When to Use
- Caching database queries or API responses
- Session storage for web applications
- Rate limiting (distributed across instances)
- Job/task queues (BullMQ, Celery)
- Pub/sub messaging between services
- Distributed locks
## When NOT to Use
- **Primary data storage** — Redis is a cache/broker, not a database of record
- **Complex queries** — use PostgreSQL for relational queries
- **Large blobs** — use S3/R2 for file storage
- **In-memory caching only** — use `functools.lru_cache` or `Map` for single-process caches
---
## Python (redis-py / FastAPI)
### Connection
```python
# src/core/redis.py
import redis.asyncio as redis
pool = redis.ConnectionPool.from_url(
"redis://localhost:6379/0",
max_connections=20,
decode_responses=True,
)
async def get_redis() -> redis.Redis:
return redis.Redis(connection_pool=pool)
```
### Cache-aside pattern
```python
import json
from datetime import timedelta
async def get_user_cached(user_id: str, db: AsyncSession) -> User:
r = await get_redis()
cache_key = f"user:{user_id}"
# Check cache
cached = await r.get(cache_key)
if cached:
return User(**json.loads(cached))
# Cache miss — fetch from DB
user = await db.get(User, user_id)
if not user:
raise HTTPException(status_code=404, detail="User not found")
# Store in cache with TTL
await r.setex(cache_key, timedelta(minutes=15), json.dumps(user.to_dict()))
return user
```
### Cache invalidation
```python
async def update_user(user_id: str, data: UpdateUserRequest, db: AsyncSession) -> User:
user = await db.get(User, user_id)
for key, value in data.dict(exclude_unset=True).items():
setattr(user, key, value)
await db.commit()
# Invalidate cache
r = await get_redis()
await r.delete(f"user:{user_id}")
return user
```
### Rate limiting
```python
from fastapi import Request, HTTPException
async def rate_limit(request: Request, limit: int = 100, window: int = 900):
r = await get_redis()
key = f"rate:{request.client.host}"
current = await r.incr(key)
if current == 1:
await r.expire(key, window)
if current > limit:
raise HTTPException(status_code=429, detail="Rate limit exceeded")
```
### Session storage
```python
import secrets
async def create_session(user_id: str) -> str:
r = await get_redis()
session_id = secrets.token_urlsafe(32)
await r.setex(f"session:{session_id}", timedelta(hours=24), user_id)
return session_id
async def get_session(session_id: str) -> str | None:
r = await get_redis()
return await r.get(f"session:{session_id}")
async def delete_session(session_id: str):
r = await get_redis()
await r.delete(f"session:{session_id}")
```
---
## TypeScript (ioredis / NestJS / Express)
### Connection
```typescript
// src/core/redis.ts
import Redis from 'ioredis';
export const redis = new Redis(process.env.REDIS_URL ?? 'redis://localhost:6379', {
maxRetriesPerRequest: 3,
lazyConnect: true,
});
```
### NestJS module
```typescript
// src/cache/cache.module.ts
import { Global, Module } from '@nestjs/common';
import { CacheService } from './cache.service';
@Global()
@Module({
providers: [CacheService],
exports: [CacheService],
})
export class CacheModule {}
```
```typescript
// src/cache/cache.service.ts
import { Injectable, OnModuleDestroy } from '@nestjs/common';
import Redis from 'ioredis';
@Injectable()
export class CacheService implements OnModuleDestroy {
private readonly redis = new Redis(process.env.REDIS_URL!);
async get<T>(key: string): Promise<T | null> {
const data = await this.redis.get(key);
return data ? JSON.parse(data) : null;
}
async set(key: string, value: unknown, ttlSeconds: number): Promise<void> {
await this.redis.setex(key, ttlSeconds, JSON.stringify(value));
}
async del(key: string): Promise<void> {
await this.redis.del(key);
}
async onModuleDestroy() {
await this.redis.quit();
}
}
```
### Cache-aside in service
```typescript
@Injectable()
export class UsersService {
constructor(
private readonly prisma: PrismaService,
private readonly cache: CacheService,
) {}
async findOne(id: string): Promise<User> {
// Check cache
const cached = await this.cache.get<User>(`user:${id}`);
if (cached) return cached;
// Cache miss
const user = await this.prisma.user.findUnique({ where: { id } });
if (!user) throw new NotFoundException(`User ${id} not found`);
// Store with 15min TTL
await this.cache.set(`user:${id}`, user, 900);
return user;
}
async update(id: string, dto: UpdateUserDto): Promise<User> {
const user = await this.prisma.user.update({ where: { id }, data: dto });
await this.cache.del(`user:${id}`); // Invalidate
return user;
}
}
```
---
## Pub/Sub
### Python
```python
# Publisher
async def publish_event(channel: str, event: dict):
r = await get_redis()
await r.publish(channel, json.dumps(event))
# Subscriber
async def subscribe_events(channel: str):
r = await get_redis()
pubsub = r.pubsub()
await pubsub.subscribe(channel)
async for message in pubsub.listen():
if message['type'] == 'message':
yield json.loads(message['data'])
```
### TypeScript
```typescript
// Publisher
const pub = new Redis(process.env.REDIS_URL!);
await pub.publish('orders', JSON.stringify({ type: 'created', orderId: '123' }));
// Subscriber (separate connection required)
const sub = new Redis(process.env.REDIS_URL!);
sub.subscribe('orders');
sub.on('message', (channel, message) => {
const event = JSON.parse(message);
console.log(`[${channel}]`, event);
});
```
---
## Key Naming Conventions
```
entity:id → user:abc123
entity:id:field → user:abc123:orders
rate:ip → rate:192.168.1.1
session:token → session:abc123def
lock:resource → lock:order-processing
queue:name → queue:email-notifications
```
---
## Common Pitfalls
1. **Not setting TTLs.** Every cache key should have an expiration. Unbounded caches exhaust memory.
2. **Cache stampede.** When a popular key expires, many requests hit the DB simultaneously. Use distributed locks or stale-while-revalidate.
3. **Using the same connection for pub/sub.** Subscribers can't run other commands. Use a dedicated connection.
4. **Storing large objects.** Redis is fast for small values. Keep values under 1MB; for larger data, store a pointer to S3.
5. **Not handling connection failures.** Redis connections drop. Use retry logic and connection pools.
6. **Forgetting to invalidate.** When data changes, delete the cache key. Stale cache is worse than no cache.
---
## Related Skills
- `caching` — HTTP caching, CDN, memoization (framework-agnostic patterns)
- `background-jobs` — BullMQ/Celery use Redis as broker
- `fastapi` — Redis integration with FastAPI dependency injection
- `nestjs` — Redis caching module in NestJS
- `docker` — Running Redis in Docker Compose for development