mirror of
https://github.com/duthaho/claudekit.git
synced 2026-06-15 14:34:55 +03:00
feat: improved the Claude Kit as a plugin
This commit is contained in:
@@ -0,0 +1,64 @@
|
||||
---
|
||||
name: databases
|
||||
description: >
|
||||
Use when working with PostgreSQL, MongoDB, or Redis — including schema design, queries, indexing, migrations, connection pooling, caching layers, or any database operation. Also activate for keywords like SQL, aggregation pipeline, BSON, ioredis, alembic, prisma migrate, django migrate, EXPLAIN ANALYZE, ORM configuration, or NoSQL data modeling.
|
||||
---
|
||||
|
||||
# Databases
|
||||
|
||||
## When to Use
|
||||
|
||||
- PostgreSQL database operations, SQL query optimization, schema design
|
||||
- JSONB document storage, full-text search, window functions, CTEs
|
||||
- MongoDB document modeling, aggregation pipelines, semi-structured data
|
||||
- Redis caching, session storage, rate limiting, pub/sub, job queues, distributed locks
|
||||
- Database migrations — adding/modifying tables, columns, indexes, constraints
|
||||
- Resolving migration conflicts, rolling back failed migrations
|
||||
|
||||
## When NOT to Use
|
||||
|
||||
- Simple key-value caching within a single process — use `functools.lru_cache` or `Map`
|
||||
- File-based storage that doesn't need a database engine
|
||||
- Static data or configuration that belongs in environment variables
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Topic | Reference | Key tools |
|
||||
|-------|-----------|-----------|
|
||||
| PostgreSQL | `references/postgresql.md` | SQL, SQLAlchemy, Prisma, EXPLAIN ANALYZE, pg_stat_statements |
|
||||
| MongoDB | `references/mongodb.md` | Aggregation, Mongoose, Motor, document schemas, ESR indexing |
|
||||
| Redis | `references/redis.md` | Caching, pub/sub, ioredis, BullMQ, session storage, distributed locks |
|
||||
| Migrations | `references/migrations.md` | Alembic, Prisma Migrate, Django migrations, rollback strategies |
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use parameterized queries everywhere.** Never concatenate user input into SQL strings.
|
||||
2. **Design schema around access patterns.** Ask "how will I read this?" before "how does this relate?" Embed data fetched together (MongoDB); normalize data accessed independently (PostgreSQL).
|
||||
3. **Index foreign keys and query fields.** PostgreSQL doesn't auto-index FK child columns. MongoDB queries without indexes trigger full collection scans.
|
||||
4. **Use appropriate consistency levels.** `TIMESTAMPTZ` over `TIMESTAMP` (PostgreSQL). `w: "majority"` for durable writes (MongoDB). TTLs on every Redis cache key.
|
||||
5. **Monitor query performance.** `pg_stat_statements` (PostgreSQL), `db.setProfilingLevel(1)` (MongoDB), connection pool metrics (all).
|
||||
6. **Use bulk/batch operations.** `bulkWrite` (MongoDB), `COPY` (PostgreSQL), pipelines (Redis) for high-throughput writes.
|
||||
7. **Never edit deployed migrations.** Create a new migration instead of modifying one already applied.
|
||||
8. **Test rollback paths.** Always verify your downgrade/rollback strategy before deploying schema changes.
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **N+1 queries from ORM lazy loading.** Use eager loading (`joinedload`, `select_related`, `$lookup` with caution).
|
||||
2. **Table locks during migrations.** Use `CREATE INDEX CONCURRENTLY` (PostgreSQL). Batch backfills for large tables.
|
||||
3. **Unbounded growth.** Dead tuples from UPDATE-heavy workloads (PostgreSQL). Arrays exceeding 16MB document limit (MongoDB). Redis keys without TTLs.
|
||||
4. **OFFSET pagination on large datasets.** Use keyset/cursor pagination instead.
|
||||
5. **Connection exhaustion.** Use connection pools (PgBouncer, application-level pools). Never open per-request connections.
|
||||
6. **Cache stampede.** When a popular Redis key expires, many requests hit the DB simultaneously. Use distributed locks or stale-while-revalidate.
|
||||
7. **Running `migrate reset` in production.** This drops all data.
|
||||
|
||||
---
|
||||
|
||||
## Related Skills
|
||||
|
||||
- `backend-frameworks` — Framework-specific ORM integration
|
||||
- `error-handling` — Database error handling patterns
|
||||
- `logging` — Query logging and slow query detection
|
||||
@@ -0,0 +1,312 @@
|
||||
# Databases — Migration Patterns
|
||||
|
||||
|
||||
# Database Migrations
|
||||
|
||||
## When to Use
|
||||
|
||||
- Adding or modifying database tables/columns
|
||||
- Creating indexes or constraints
|
||||
- Running migrations in development, staging, or production
|
||||
- Resolving migration conflicts in a team
|
||||
- Rolling back a failed migration
|
||||
|
||||
## When NOT to Use
|
||||
|
||||
- Query optimization without schema changes — use `postgresql` skill
|
||||
- Initial database design from scratch — use `postgresql` or `mongodb` skill
|
||||
- ORM configuration without migrations — use framework-specific skill
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| I need... | Go to |
|
||||
|-----------|-------|
|
||||
| Alembic (FastAPI/SQLAlchemy) | SS Alembic below |
|
||||
| Prisma (NestJS/Express) | SS Prisma below |
|
||||
| Django migrations | SS Django below |
|
||||
| Safe production patterns | SS Production Safety below |
|
||||
| Rollback strategies | SS Rollbacks below |
|
||||
|
||||
---
|
||||
|
||||
## Alembic (Python / SQLAlchemy)
|
||||
|
||||
### Setup
|
||||
|
||||
```bash
|
||||
pip install alembic
|
||||
alembic init migrations
|
||||
```
|
||||
|
||||
```python
|
||||
# migrations/env.py — configure target metadata
|
||||
from src.models import Base
|
||||
target_metadata = Base.metadata
|
||||
```
|
||||
|
||||
### Create a migration
|
||||
|
||||
```bash
|
||||
# Auto-generate from model changes
|
||||
alembic revision --autogenerate -m "add orders table"
|
||||
|
||||
# Manual migration (for data migrations or complex changes)
|
||||
alembic revision -m "backfill order status"
|
||||
```
|
||||
|
||||
### Migration file
|
||||
|
||||
```python
|
||||
# migrations/versions/003_add_orders_table.py
|
||||
"""add orders table"""
|
||||
|
||||
from alembic import op
|
||||
import sqlalchemy as sa
|
||||
|
||||
revision = '003'
|
||||
down_revision = '002'
|
||||
|
||||
def upgrade() -> None:
|
||||
op.create_table(
|
||||
'orders',
|
||||
sa.Column('id', sa.UUID(), primary_key=True, server_default=sa.text('gen_random_uuid()')),
|
||||
sa.Column('user_id', sa.UUID(), sa.ForeignKey('users.id', ondelete='CASCADE'), nullable=False),
|
||||
sa.Column('total', sa.Numeric(10, 2), nullable=False),
|
||||
sa.Column('status', sa.String(20), nullable=False, server_default='pending'),
|
||||
sa.Column('created_at', sa.DateTime(timezone=True), server_default=sa.func.now()),
|
||||
)
|
||||
op.create_index('ix_orders_user_id', 'orders', ['user_id'])
|
||||
op.create_index('ix_orders_created_at', 'orders', ['created_at'])
|
||||
|
||||
def downgrade() -> None:
|
||||
op.drop_table('orders')
|
||||
```
|
||||
|
||||
### Run migrations
|
||||
|
||||
```bash
|
||||
# Apply all pending
|
||||
alembic upgrade head
|
||||
|
||||
# Apply one step
|
||||
alembic upgrade +1
|
||||
|
||||
# Check current state
|
||||
alembic current
|
||||
|
||||
# Check for pending migrations
|
||||
alembic check
|
||||
|
||||
# View migration history
|
||||
alembic history --verbose
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Prisma (TypeScript / NestJS / Express)
|
||||
|
||||
### Create a migration
|
||||
|
||||
```bash
|
||||
# Generate migration from schema changes
|
||||
npx prisma migrate dev --name add_orders_table
|
||||
|
||||
# Apply in production (no interactive prompts)
|
||||
npx prisma migrate deploy
|
||||
|
||||
# Check status
|
||||
npx prisma migrate status
|
||||
```
|
||||
|
||||
### Schema change
|
||||
|
||||
```prisma
|
||||
// prisma/schema.prisma
|
||||
model Order {
|
||||
id String @id @default(uuid())
|
||||
userId String
|
||||
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
|
||||
total Decimal @db.Decimal(10, 2)
|
||||
status String @default("pending")
|
||||
createdAt DateTime @default(now())
|
||||
|
||||
@@index([userId])
|
||||
@@index([createdAt])
|
||||
}
|
||||
```
|
||||
|
||||
### Generated migration SQL
|
||||
|
||||
```sql
|
||||
-- prisma/migrations/20260417_add_orders_table/migration.sql
|
||||
CREATE TABLE "Order" (
|
||||
"id" TEXT NOT NULL DEFAULT gen_random_uuid(),
|
||||
"userId" TEXT NOT NULL,
|
||||
"total" DECIMAL(10,2) NOT NULL,
|
||||
"status" TEXT NOT NULL DEFAULT 'pending',
|
||||
"createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
CONSTRAINT "Order_pkey" PRIMARY KEY ("id")
|
||||
);
|
||||
|
||||
CREATE INDEX "Order_userId_idx" ON "Order"("userId");
|
||||
CREATE INDEX "Order_createdAt_idx" ON "Order"("createdAt");
|
||||
|
||||
ALTER TABLE "Order" ADD CONSTRAINT "Order_userId_fkey"
|
||||
FOREIGN KEY ("userId") REFERENCES "User"("id") ON DELETE CASCADE;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Django
|
||||
|
||||
### Create and apply
|
||||
|
||||
```bash
|
||||
# Auto-generate from model changes
|
||||
python manage.py makemigrations app_name
|
||||
|
||||
# Apply
|
||||
python manage.py migrate
|
||||
|
||||
# Check for pending
|
||||
python manage.py showmigrations
|
||||
|
||||
# SQL preview (don't execute)
|
||||
python manage.py sqlmigrate app_name 0003
|
||||
```
|
||||
|
||||
### Data migration
|
||||
|
||||
```python
|
||||
# app/migrations/0004_backfill_order_status.py
|
||||
from django.db import migrations
|
||||
|
||||
def backfill_status(apps, schema_editor):
|
||||
Order = apps.get_model('orders', 'Order')
|
||||
Order.objects.filter(status='').update(status='pending')
|
||||
|
||||
class Migration(migrations.Migration):
|
||||
dependencies = [('orders', '0003_add_orders')]
|
||||
operations = [migrations.RunPython(backfill_status, migrations.RunPython.noop)]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Production Safety
|
||||
|
||||
### Golden rules
|
||||
|
||||
1. **Never drop columns in the same deploy as removing code references.** Remove code first, deploy, then drop column in next migration.
|
||||
2. **Add columns as nullable or with defaults.** `NOT NULL` without a default locks the table during backfill on large tables.
|
||||
3. **Create indexes concurrently** (PostgreSQL):
|
||||
```sql
|
||||
CREATE INDEX CONCURRENTLY ix_orders_status ON orders(status);
|
||||
```
|
||||
4. **Test migrations against a production-size dataset** before deploying.
|
||||
5. **Always have a rollback plan** — either a `downgrade()` function or a manual SQL script.
|
||||
|
||||
### Safe column addition pattern
|
||||
|
||||
```python
|
||||
# Step 1: Add nullable column (fast, no lock)
|
||||
op.add_column('users', sa.Column('phone', sa.String(20), nullable=True))
|
||||
|
||||
# Step 2: Backfill in batches (separate migration or script)
|
||||
# Don't do UPDATE users SET phone = '...' on millions of rows at once
|
||||
|
||||
# Step 3: Add NOT NULL constraint (after backfill confirms all rows filled)
|
||||
op.alter_column('users', 'phone', nullable=False)
|
||||
```
|
||||
|
||||
### Safe column rename pattern
|
||||
|
||||
```
|
||||
Deploy 1: Add new column, write to both old and new
|
||||
Deploy 2: Backfill new column from old, read from new
|
||||
Deploy 3: Stop writing to old column
|
||||
Deploy 4: Drop old column
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rollbacks
|
||||
|
||||
### Alembic
|
||||
|
||||
```bash
|
||||
# Rollback one step
|
||||
alembic downgrade -1
|
||||
|
||||
# Rollback to specific revision
|
||||
alembic downgrade 002
|
||||
|
||||
# Rollback to base (dangerous — drops everything)
|
||||
alembic downgrade base
|
||||
```
|
||||
|
||||
### Prisma
|
||||
|
||||
Prisma doesn't have built-in rollback. Options:
|
||||
- Apply a new migration that reverses the change
|
||||
- Manually run SQL: `npx prisma db execute --file rollback.sql`
|
||||
- Restore from database backup
|
||||
|
||||
### Django
|
||||
|
||||
```bash
|
||||
# Rollback to specific migration
|
||||
python manage.py migrate app_name 0002
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Team Workflow
|
||||
|
||||
### Resolving migration conflicts
|
||||
|
||||
When two developers create migrations from the same parent:
|
||||
|
||||
**Alembic:**
|
||||
```bash
|
||||
# Developer A and B both branched from revision 002
|
||||
# Alembic detects multiple heads
|
||||
alembic heads # shows 003a and 003b
|
||||
alembic merge -m "merge migrations" 003a 003b
|
||||
alembic upgrade head
|
||||
```
|
||||
|
||||
**Prisma:**
|
||||
```bash
|
||||
# Reset and re-apply (dev only)
|
||||
npx prisma migrate reset
|
||||
# Or resolve manually by editing the migration SQL
|
||||
```
|
||||
|
||||
**Django:**
|
||||
```bash
|
||||
# Django auto-detects and asks to merge
|
||||
python manage.py makemigrations --merge
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **Running `migrate reset` in production.** This drops all data. Only use in development.
|
||||
2. **Editing already-applied migrations.** Never modify a migration that's been deployed. Create a new migration instead.
|
||||
3. **Forgetting indexes.** Add indexes for foreign keys and frequently-queried columns in the same migration.
|
||||
4. **Large table locks.** `ALTER TABLE` with `NOT NULL` or `ADD COLUMN DEFAULT` can lock large tables. Use batched backfills.
|
||||
5. **Not testing downgrade.** Always test your rollback path before deploying.
|
||||
6. **Circular foreign keys.** Use `sa.ForeignKey` with `use_alter=True` in Alembic to handle circular deps.
|
||||
|
||||
---
|
||||
|
||||
## Related Skills
|
||||
|
||||
- `postgresql` — Database design, query optimization, indexing strategies
|
||||
- `fastapi` — SQLAlchemy async patterns with FastAPI
|
||||
- `nestjs` — Prisma integration with NestJS
|
||||
- `django` — Django ORM models and migrations
|
||||
- `docker` — Running migration containers in CI/CD
|
||||
@@ -0,0 +1,576 @@
|
||||
# Databases — MongoDB Patterns
|
||||
|
||||
|
||||
# MongoDB
|
||||
|
||||
## When to Use
|
||||
|
||||
- MongoDB database operations
|
||||
- Document-based data modeling
|
||||
- Aggregation pipelines
|
||||
- Semi-structured or polymorphic data that varies per record
|
||||
- Rapid prototyping where schema flexibility accelerates iteration
|
||||
- Event logging, IoT telemetry, or content management systems
|
||||
|
||||
## When NOT to Use
|
||||
|
||||
- Relational-heavy data models with complex joins and foreign key constraints
|
||||
- SQL-only projects where the entire stack is built around relational databases
|
||||
- Simple key-value storage where Redis or a lightweight store is more appropriate
|
||||
- Financial systems requiring multi-table ACID transactions as the norm
|
||||
|
||||
---
|
||||
|
||||
## Core Patterns
|
||||
|
||||
### 1. Schema Design
|
||||
|
||||
The central decision in MongoDB modeling is **embed vs. reference**.
|
||||
|
||||
**Decision tree:**
|
||||
|
||||
```
|
||||
Does the child data belong to exactly one parent?
|
||||
YES --> Is the child array unbounded (could grow to thousands)?
|
||||
YES --> Reference (separate collection)
|
||||
NO --> Embed
|
||||
NO --> Is it a many-to-many relationship?
|
||||
YES --> Reference (with array of ObjectIds on one or both sides)
|
||||
NO --> Reference
|
||||
```
|
||||
|
||||
**Embedding pattern -- best for data that is read together:**
|
||||
|
||||
```javascript
|
||||
// User with embedded address and preferences
|
||||
// Good: one read fetches everything the profile page needs
|
||||
db.users.insertOne({
|
||||
email: "user@example.com",
|
||||
name: "Alice Chen",
|
||||
address: {
|
||||
street: "123 Main St",
|
||||
city: "Portland",
|
||||
state: "OR",
|
||||
zip: "97201"
|
||||
},
|
||||
preferences: {
|
||||
theme: "dark",
|
||||
language: "en",
|
||||
notifications: { email: true, push: false }
|
||||
},
|
||||
createdAt: new Date()
|
||||
});
|
||||
```
|
||||
|
||||
**Referencing pattern -- best for independent or unbounded data:**
|
||||
|
||||
```javascript
|
||||
// Orders reference the user by ID
|
||||
// Good: orders grow unboundedly, accessed independently
|
||||
db.orders.insertOne({
|
||||
userId: ObjectId("6651a..."),
|
||||
status: "shipped",
|
||||
totalCents: 4999,
|
||||
items: [
|
||||
{ sku: "WIDGET-001", name: "Blue Widget", qty: 2, priceCents: 1999 },
|
||||
{ sku: "GADGET-010", name: "Mini Gadget", qty: 1, priceCents: 1001 }
|
||||
],
|
||||
placedAt: new Date()
|
||||
});
|
||||
```
|
||||
|
||||
**Denormalization pattern -- duplicate data to avoid frequent lookups:**
|
||||
|
||||
```javascript
|
||||
// Store author name directly on the post (denormalized from users)
|
||||
// Trade-off: faster reads, but updates to user name require updating all posts
|
||||
db.posts.insertOne({
|
||||
title: "Getting Started with MongoDB",
|
||||
body: "...",
|
||||
author: {
|
||||
_id: ObjectId("6651a..."),
|
||||
name: "Alice Chen" // denormalized -- must be updated if name changes
|
||||
},
|
||||
tags: ["mongodb", "tutorial"],
|
||||
publishedAt: new Date()
|
||||
});
|
||||
```
|
||||
|
||||
**Polymorphic pattern -- different shapes in one collection:**
|
||||
|
||||
```javascript
|
||||
// Events collection stores different event types
|
||||
db.events.insertMany([
|
||||
{
|
||||
type: "page_view",
|
||||
userId: ObjectId("6651a..."),
|
||||
url: "/products/widget",
|
||||
timestamp: new Date()
|
||||
},
|
||||
{
|
||||
type: "purchase",
|
||||
userId: ObjectId("6651a..."),
|
||||
orderId: ObjectId("6651b..."),
|
||||
totalCents: 4999,
|
||||
timestamp: new Date()
|
||||
}
|
||||
]);
|
||||
// Use a discriminator field (type) and query by it
|
||||
```
|
||||
|
||||
**Schema validation -- enforce structure at the database level:**
|
||||
|
||||
```javascript
|
||||
db.createCollection("users", {
|
||||
validator: {
|
||||
$jsonSchema: {
|
||||
bsonType: "object",
|
||||
required: ["email", "name", "createdAt"],
|
||||
properties: {
|
||||
email: {
|
||||
bsonType: "string",
|
||||
pattern: "^.+@.+\\..+$",
|
||||
description: "Must be a valid email"
|
||||
},
|
||||
name: {
|
||||
bsonType: "string",
|
||||
minLength: 1
|
||||
},
|
||||
role: {
|
||||
enum: ["admin", "editor", "viewer"],
|
||||
description: "Must be a valid role"
|
||||
},
|
||||
createdAt: { bsonType: "date" }
|
||||
}
|
||||
}
|
||||
},
|
||||
validationLevel: "strict",
|
||||
validationAction: "error"
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Aggregation Pipeline
|
||||
|
||||
Build complex data transformations as a sequence of stages.
|
||||
|
||||
```javascript
|
||||
// Revenue report: total and average spend per user, last 30 days
|
||||
db.orders.aggregate([
|
||||
// Stage 1: filter to recent delivered orders
|
||||
{ $match: {
|
||||
status: "delivered",
|
||||
placedAt: { $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) }
|
||||
}},
|
||||
|
||||
// Stage 2: group by user
|
||||
{ $group: {
|
||||
_id: "$userId",
|
||||
totalSpent: { $sum: "$totalCents" },
|
||||
orderCount: { $sum: 1 },
|
||||
avgOrderValue: { $avg: "$totalCents" }
|
||||
}},
|
||||
|
||||
// Stage 3: sort by spend
|
||||
{ $sort: { totalSpent: -1 } },
|
||||
|
||||
// Stage 4: limit to top 10
|
||||
{ $limit: 10 },
|
||||
|
||||
// Stage 5: join user details
|
||||
{ $lookup: {
|
||||
from: "users",
|
||||
localField: "_id",
|
||||
foreignField: "_id",
|
||||
as: "user"
|
||||
}},
|
||||
|
||||
// Stage 6: flatten the joined array
|
||||
{ $unwind: "$user" },
|
||||
|
||||
// Stage 7: reshape output
|
||||
{ $project: {
|
||||
_id: 0,
|
||||
userName: "$user.name",
|
||||
email: "$user.email",
|
||||
totalSpent: 1,
|
||||
orderCount: 1,
|
||||
avgOrderValue: { $round: ["$avgOrderValue", 0] }
|
||||
}}
|
||||
]);
|
||||
```
|
||||
|
||||
**$unwind -- flatten arrays into individual documents:**
|
||||
|
||||
```javascript
|
||||
// Expand order items to analyze product-level metrics
|
||||
db.orders.aggregate([
|
||||
{ $unwind: "$items" },
|
||||
{ $group: {
|
||||
_id: "$items.sku",
|
||||
totalQty: { $sum: "$items.qty" },
|
||||
totalRevenue: { $sum: { $multiply: ["$items.qty", "$items.priceCents"] } }
|
||||
}},
|
||||
{ $sort: { totalRevenue: -1 } }
|
||||
]);
|
||||
```
|
||||
|
||||
**$lookup with pipeline -- filtered/correlated joins:**
|
||||
|
||||
```javascript
|
||||
// For each user, get their 3 most recent orders
|
||||
db.users.aggregate([
|
||||
{ $lookup: {
|
||||
from: "orders",
|
||||
let: { uid: "$_id" },
|
||||
pipeline: [
|
||||
{ $match: { $expr: { $eq: ["$userId", "$$uid"] } } },
|
||||
{ $sort: { placedAt: -1 } },
|
||||
{ $limit: 3 },
|
||||
{ $project: { status: 1, totalCents: 1, placedAt: 1 } }
|
||||
],
|
||||
as: "recentOrders"
|
||||
}}
|
||||
]);
|
||||
```
|
||||
|
||||
**$facet -- run multiple aggregations in parallel:**
|
||||
|
||||
```javascript
|
||||
// Dashboard: get summary stats and top products in one query
|
||||
db.orders.aggregate([
|
||||
{ $match: { status: "delivered" } },
|
||||
{ $facet: {
|
||||
summary: [
|
||||
{ $group: {
|
||||
_id: null,
|
||||
totalRevenue: { $sum: "$totalCents" },
|
||||
totalOrders: { $sum: 1 }
|
||||
}}
|
||||
],
|
||||
topProducts: [
|
||||
{ $unwind: "$items" },
|
||||
{ $group: { _id: "$items.sku", sold: { $sum: "$items.qty" } } },
|
||||
{ $sort: { sold: -1 } },
|
||||
{ $limit: 5 }
|
||||
],
|
||||
monthlyTrend: [
|
||||
{ $group: {
|
||||
_id: { $dateToString: { format: "%Y-%m", date: "$placedAt" } },
|
||||
revenue: { $sum: "$totalCents" }
|
||||
}},
|
||||
{ $sort: { _id: 1 } }
|
||||
]
|
||||
}}
|
||||
]);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Index Strategies
|
||||
|
||||
```javascript
|
||||
// Single field index -- most common
|
||||
db.users.createIndex({ email: 1 }, { unique: true });
|
||||
|
||||
// Compound index -- order matters, follows the ESR rule:
|
||||
// Equality fields first, Sort fields next, Range fields last
|
||||
db.orders.createIndex({ status: 1, placedAt: -1 });
|
||||
// Supports: find({status: "pending"}).sort({placedAt: -1})
|
||||
// Also supports: find({status: "pending"}) alone (prefix)
|
||||
|
||||
// Multikey index -- automatically indexes each array element
|
||||
db.posts.createIndex({ tags: 1 });
|
||||
// Supports: find({ tags: "mongodb" })
|
||||
|
||||
// Text index -- basic full-text search
|
||||
db.posts.createIndex(
|
||||
{ title: "text", body: "text" },
|
||||
{ weights: { title: 10, body: 1 }, name: "posts_text_search" }
|
||||
);
|
||||
// Usage:
|
||||
db.posts.find(
|
||||
{ $text: { $search: "mongodb aggregation" } },
|
||||
{ score: { $meta: "textScore" } }
|
||||
).sort({ score: { $meta: "textScore" } });
|
||||
|
||||
// TTL index -- auto-delete documents after expiry
|
||||
db.sessions.createIndex(
|
||||
{ expiresAt: 1 },
|
||||
{ expireAfterSeconds: 0 } // delete when expiresAt is in the past
|
||||
);
|
||||
// Documents must have a Date field; they are removed by a background task ~every 60s
|
||||
|
||||
// Partial index -- only index documents matching a filter
|
||||
db.orders.createIndex(
|
||||
{ placedAt: -1 },
|
||||
{ partialFilterExpression: { status: "pending" } }
|
||||
);
|
||||
// Smaller index; only used when the query includes the filter condition
|
||||
|
||||
// Wildcard index -- for querying arbitrary keys in a sub-document
|
||||
db.products.createIndex({ "attributes.$**": 1 });
|
||||
// Supports: find({ "attributes.color": "red" }) without knowing keys in advance
|
||||
|
||||
// Collation -- case-insensitive sorting and matching
|
||||
db.users.createIndex(
|
||||
{ name: 1 },
|
||||
{ collation: { locale: "en", strength: 2 } }
|
||||
);
|
||||
```
|
||||
|
||||
**The ESR rule for compound indexes:** order fields by **E**quality, **S**ort, **R**ange. This produces the most efficient index scans.
|
||||
|
||||
```javascript
|
||||
// Query: find active orders for a user, sorted by date, in a date range
|
||||
// Equality: userId, status
|
||||
// Sort: placedAt
|
||||
// Range: placedAt (but sort and range on same field -- sort wins)
|
||||
db.orders.createIndex({ userId: 1, status: 1, placedAt: -1 });
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Transactions
|
||||
|
||||
Multi-document transactions work across collections (requires replica set or sharded cluster).
|
||||
|
||||
```javascript
|
||||
const session = client.startSession();
|
||||
|
||||
try {
|
||||
session.startTransaction({
|
||||
readConcern: { level: "snapshot" },
|
||||
writeConcern: { w: "majority" },
|
||||
readPreference: "primary"
|
||||
});
|
||||
|
||||
const accounts = client.db("bank").collection("accounts");
|
||||
|
||||
// Transfer $50 from account A to account B
|
||||
const fromAccount = await accounts.findOne(
|
||||
{ _id: "account-A" },
|
||||
{ session }
|
||||
);
|
||||
|
||||
if (fromAccount.balanceCents < 5000) {
|
||||
await session.abortTransaction();
|
||||
throw new Error("Insufficient funds");
|
||||
}
|
||||
|
||||
await accounts.updateOne(
|
||||
{ _id: "account-A" },
|
||||
{ $inc: { balanceCents: -5000 } },
|
||||
{ session }
|
||||
);
|
||||
|
||||
await accounts.updateOne(
|
||||
{ _id: "account-B" },
|
||||
{ $inc: { balanceCents: 5000 } },
|
||||
{ session }
|
||||
);
|
||||
|
||||
// Record the transfer in a separate collection -- still in the same tx
|
||||
await client.db("bank").collection("transfers").insertOne({
|
||||
from: "account-A",
|
||||
to: "account-B",
|
||||
amountCents: 5000,
|
||||
timestamp: new Date()
|
||||
}, { session });
|
||||
|
||||
await session.commitTransaction();
|
||||
} catch (error) {
|
||||
await session.abortTransaction();
|
||||
throw error;
|
||||
} finally {
|
||||
await session.endSession();
|
||||
}
|
||||
```
|
||||
|
||||
**Guidelines:**
|
||||
- Keep transactions short -- they hold locks and consume resources
|
||||
- Design your schema to minimize the need for multi-document transactions
|
||||
- Transactions have a default 60-second timeout (`maxTimeMS`)
|
||||
- Retryable writes (`retryWrites=true` in connection string) handle transient errors automatically
|
||||
|
||||
---
|
||||
|
||||
### 5. Change Streams
|
||||
|
||||
Watch for real-time changes to collections, databases, or the entire deployment.
|
||||
|
||||
```javascript
|
||||
// Watch a single collection for inserts and updates
|
||||
const pipeline = [
|
||||
{ $match: {
|
||||
operationType: { $in: ["insert", "update"] },
|
||||
"fullDocument.status": "urgent"
|
||||
}}
|
||||
];
|
||||
|
||||
const changeStream = db.collection("tickets").watch(pipeline, {
|
||||
fullDocument: "updateLookup" // include the full document on updates
|
||||
});
|
||||
|
||||
changeStream.on("change", (change) => {
|
||||
console.log("Change detected:", change.operationType);
|
||||
console.log("Document:", change.fullDocument);
|
||||
console.log("Resume token:", change.resumeToken);
|
||||
|
||||
// Process the change (e.g., send notification, update cache)
|
||||
notifyTeam(change.fullDocument);
|
||||
});
|
||||
|
||||
// Handle errors and resume from last known position
|
||||
changeStream.on("error", (error) => {
|
||||
console.error("Change stream error:", error);
|
||||
// Reconnect using the stored resume token
|
||||
});
|
||||
```
|
||||
|
||||
**Resumable pattern for production:**
|
||||
|
||||
```javascript
|
||||
let resumeToken = await loadResumeTokenFromStorage();
|
||||
|
||||
async function watchWithResume(collection) {
|
||||
const options = { fullDocument: "updateLookup" };
|
||||
if (resumeToken) {
|
||||
options.resumeAfter = resumeToken;
|
||||
}
|
||||
|
||||
const stream = collection.watch([], options);
|
||||
|
||||
stream.on("change", async (change) => {
|
||||
// Process change
|
||||
await handleChange(change);
|
||||
|
||||
// Persist resume token so we can recover after restart
|
||||
resumeToken = change._id;
|
||||
await saveResumeTokenToStorage(resumeToken);
|
||||
});
|
||||
|
||||
stream.on("error", async () => {
|
||||
// Wait and reconnect
|
||||
await new Promise(r => setTimeout(r, 5000));
|
||||
watchWithResume(collection);
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Use cases:** real-time dashboards, cache invalidation, event-driven architectures, syncing data to search indexes (e.g., Elasticsearch).
|
||||
|
||||
---
|
||||
|
||||
### 6. Performance
|
||||
|
||||
#### Reading explain() output
|
||||
|
||||
```javascript
|
||||
// Run explain to see the query plan
|
||||
db.orders.find({
|
||||
userId: ObjectId("6651a..."),
|
||||
status: "pending"
|
||||
}).sort({ placedAt: -1 }).explain("executionStats");
|
||||
```
|
||||
|
||||
**Key fields in executionStats:**
|
||||
|
||||
| Field | What to look for |
|
||||
|-------|-----------------|
|
||||
| `winningPlan.stage` | `IXSCAN` good, `COLLSCAN` bad (full collection scan) |
|
||||
| `totalKeysExamined` | Should be close to `nReturned` (no wasted index scans) |
|
||||
| `totalDocsExamined` | Should be close to `nReturned` (no wasted document reads) |
|
||||
| `executionTimeMillis` | Overall query time |
|
||||
| `rejectedPlans` | Shows alternatives the optimizer considered |
|
||||
|
||||
**Covered queries -- answered entirely from the index:**
|
||||
|
||||
```javascript
|
||||
// Create an index that covers the query
|
||||
db.orders.createIndex({ userId: 1, status: 1, totalCents: 1 });
|
||||
|
||||
// This query only needs fields in the index -- no document fetch
|
||||
db.orders.find(
|
||||
{ userId: ObjectId("6651a..."), status: "delivered" },
|
||||
{ _id: 0, totalCents: 1 } // projection must exclude _id and only include indexed fields
|
||||
);
|
||||
// explain() will show: "totalDocsExamined": 0
|
||||
```
|
||||
|
||||
**Projection optimization -- fetch only what you need:**
|
||||
|
||||
```javascript
|
||||
// BAD: fetches entire document including large body field
|
||||
const posts = await db.posts.find({ author: userId }).toArray();
|
||||
|
||||
// GOOD: only fetch fields needed for the list view
|
||||
const posts = await db.posts.find(
|
||||
{ author: userId },
|
||||
{ projection: { title: 1, publishedAt: 1, tags: 1 } }
|
||||
).toArray();
|
||||
```
|
||||
|
||||
**Bulk operations for write-heavy workloads:**
|
||||
|
||||
```javascript
|
||||
const bulk = db.products.initializeUnorderedBulkOp();
|
||||
|
||||
for (const update of priceUpdates) {
|
||||
bulk.find({ sku: update.sku })
|
||||
.updateOne({ $set: { priceCents: update.newPrice, updatedAt: new Date() } });
|
||||
}
|
||||
|
||||
const result = await bulk.execute();
|
||||
console.log(`Modified: ${result.nModified}, Errors: ${result.getWriteErrorCount()}`);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Design schema around query patterns, not data relationships.** Ask "how will I read this data?" before "how does this data relate?" Embed data that is always fetched together; reference data accessed independently.
|
||||
|
||||
2. **Use the ESR rule for compound indexes.** Order index fields by Equality, Sort, Range. This maximizes the index's usefulness and minimizes keys examined.
|
||||
|
||||
3. **Set read/write concerns appropriately.** Use `w: "majority"` and `readConcern: "majority"` for data that must survive failovers. Use `w: 1` for non-critical writes where speed matters more than durability.
|
||||
|
||||
4. **Use projection to limit returned fields.** Transferring large documents over the network when you only need two fields wastes bandwidth and memory. Always project.
|
||||
|
||||
5. **Avoid unbounded array growth.** An embedded array that can grow to thousands of elements bloats the document (16 MB max) and degrades performance. Move to a separate collection with a reference when the array exceeds ~100 elements.
|
||||
|
||||
6. **Use bulk operations for batch writes.** Individual `insertOne` or `updateOne` calls in a loop are slow. Batch them with `bulkWrite` or `initializeUnorderedBulkOp` for 10-50x throughput improvement.
|
||||
|
||||
7. **Enable retryable writes.** Add `retryWrites=true` to your connection string. This handles transient network errors and primary elections automatically without application-level retry logic.
|
||||
|
||||
8. **Monitor with database profiler and serverStatus.** Use `db.setProfilingLevel(1, { slowms: 100 })` to log slow queries. Check `db.serverStatus().opcounters` and `db.serverStatus().connections` for overall health.
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **Treating MongoDB like a relational database.** Normalizing everything into separate collections and using `$lookup` for every query defeats the purpose. If you need heavy joins, PostgreSQL is likely a better fit. Design for embedding first.
|
||||
|
||||
2. **Missing indexes on query fields.** Every `find()`, `$match`, and `sort()` should be backed by an index. Use `db.collection.getIndexes()` and `explain()` to verify. A `COLLSCAN` on a large collection is almost always a bug.
|
||||
|
||||
3. **Ignoring the 16 MB document size limit.** Embedding unbounded arrays (comments, logs, events) will eventually hit this wall, crashing writes. Use the bucket pattern (fixed-size sub-documents) or reference a separate collection.
|
||||
|
||||
4. **Not using readPreference for read-heavy workloads.** By default all reads go to the primary. For analytics or non-critical reads, use `readPreference: "secondaryPreferred"` to distribute load across replicas.
|
||||
|
||||
5. **Forgetting that updates replace matched array elements, not all of them.** Using `$set` on a matched array element with positional `$` only updates the first match. Use `$[]` for all elements or `$[<identifier>]` with `arrayFilters` for conditional updates:
|
||||
|
||||
```javascript
|
||||
// Update price for a specific item in all orders
|
||||
db.orders.updateMany(
|
||||
{ "items.sku": "WIDGET-001" },
|
||||
{ $set: { "items.$[item].priceCents": 2499 } },
|
||||
{ arrayFilters: [{ "item.sku": "WIDGET-001" }] }
|
||||
);
|
||||
```
|
||||
|
||||
6. **Running aggregation pipelines without early $match.** Always filter as early as possible in the pipeline. A `$group` or `$unwind` before `$match` processes the entire collection unnecessarily. Put `$match` first to leverage indexes and reduce documents flowing through subsequent stages.
|
||||
|
||||
## Related Skills
|
||||
|
||||
- `postgresql` - Relational database patterns for structured data with complex relationships
|
||||
- `caching` - Caching strategies to reduce database load
|
||||
- `logging` - Logging patterns for query debugging and monitoring
|
||||
@@ -0,0 +1,609 @@
|
||||
# Databases — PostgreSQL Patterns
|
||||
|
||||
|
||||
# PostgreSQL
|
||||
|
||||
## When to Use
|
||||
|
||||
- PostgreSQL database operations
|
||||
- SQL query optimization
|
||||
- Schema design and migrations
|
||||
- JSONB document storage within a relational model
|
||||
- Full-text search without a dedicated search engine
|
||||
- Complex analytical queries with window functions and CTEs
|
||||
|
||||
## When NOT to Use
|
||||
|
||||
- NoSQL-only projects where no relational database is involved
|
||||
- In-memory databases like Redis or SQLite used purely for caching or ephemeral storage
|
||||
- File-based storage scenarios that do not require a database engine
|
||||
|
||||
---
|
||||
|
||||
## Core Patterns
|
||||
|
||||
### 1. Schema Design
|
||||
|
||||
Design tables with explicit constraints, proper types, and clear relationships.
|
||||
|
||||
```sql
|
||||
-- Enums for constrained value sets
|
||||
CREATE TYPE user_role AS ENUM ('admin', 'editor', 'viewer');
|
||||
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered', 'cancelled');
|
||||
|
||||
-- Composite types for reusable structures
|
||||
CREATE TYPE address AS (
|
||||
street TEXT,
|
||||
city TEXT,
|
||||
state TEXT,
|
||||
zip VARCHAR(10)
|
||||
);
|
||||
|
||||
-- Users table with constraints
|
||||
CREATE TABLE users (
|
||||
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
|
||||
email TEXT NOT NULL UNIQUE,
|
||||
name TEXT NOT NULL CHECK (char_length(name) >= 1),
|
||||
role user_role NOT NULL DEFAULT 'viewer',
|
||||
metadata JSONB DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
-- Organizations with self-referencing hierarchy
|
||||
CREATE TABLE organizations (
|
||||
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
|
||||
name TEXT NOT NULL,
|
||||
parent_id BIGINT REFERENCES organizations(id) ON DELETE SET NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
-- Membership join table with composite primary key
|
||||
CREATE TABLE org_memberships (
|
||||
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
|
||||
org_id BIGINT NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
|
||||
role user_role NOT NULL DEFAULT 'viewer',
|
||||
joined_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
PRIMARY KEY (user_id, org_id)
|
||||
);
|
||||
|
||||
-- Orders with foreign keys, check constraints, and enum status
|
||||
CREATE TABLE orders (
|
||||
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
|
||||
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
|
||||
status order_status NOT NULL DEFAULT 'pending',
|
||||
total_cents BIGINT NOT NULL CHECK (total_cents >= 0),
|
||||
shipping address,
|
||||
items JSONB NOT NULL DEFAULT '[]',
|
||||
placed_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
-- Auto-update updated_at with a trigger
|
||||
CREATE OR REPLACE FUNCTION set_updated_at()
|
||||
RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
NEW.updated_at = now();
|
||||
RETURN NEW;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
CREATE TRIGGER trg_users_updated_at
|
||||
BEFORE UPDATE ON users
|
||||
FOR EACH ROW EXECUTE FUNCTION set_updated_at();
|
||||
```
|
||||
|
||||
**Key principles:**
|
||||
- Use `BIGINT GENERATED ALWAYS AS IDENTITY` over `SERIAL` for new projects
|
||||
- Use `TIMESTAMPTZ` (not `TIMESTAMP`) to store times with timezone awareness
|
||||
- Prefer `TEXT` over `VARCHAR(n)` unless a hard length limit is business-critical
|
||||
- Add `ON DELETE` actions on every foreign key (CASCADE, RESTRICT, or SET NULL)
|
||||
- Use `CHECK` constraints for business rules that live at the data level
|
||||
|
||||
---
|
||||
|
||||
### 2. Index Strategy
|
||||
|
||||
Choose the right index type based on your query patterns.
|
||||
|
||||
**Decision guide:**
|
||||
|
||||
| Query Pattern | Index Type | Example |
|
||||
|---------------|-----------|---------|
|
||||
| Equality (`=`) and range (`<`, `>`, `BETWEEN`) | B-tree (default) | `WHERE created_at > '2025-01-01'` |
|
||||
| Array containment (`@>`), JSONB queries | GIN | `WHERE tags @> '{postgres}'` |
|
||||
| Full-text search (`@@`) | GIN | `WHERE to_tsvector(body) @@ query` |
|
||||
| Geometry, range overlap | GiST | `WHERE location <-> point '(40.7,-74.0)' < 0.01` |
|
||||
| Filtered subset of rows | Partial | `WHERE active = true` |
|
||||
| Index-only scans (no heap lookup) | Covering (INCLUDE) | Frequently selected columns |
|
||||
|
||||
```sql
|
||||
-- B-tree: default, good for equality and range
|
||||
CREATE INDEX idx_orders_placed_at ON orders(placed_at DESC);
|
||||
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
|
||||
|
||||
-- GIN: arrays and JSONB containment
|
||||
CREATE INDEX idx_users_metadata ON users USING GIN (metadata);
|
||||
CREATE INDEX idx_orders_items ON orders USING GIN (items jsonb_path_ops);
|
||||
|
||||
-- GIN: full-text search
|
||||
ALTER TABLE articles ADD COLUMN search_vector tsvector
|
||||
GENERATED ALWAYS AS (
|
||||
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
|
||||
setweight(to_tsvector('english', coalesce(body, '')), 'B')
|
||||
) STORED;
|
||||
|
||||
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
|
||||
|
||||
-- Full-text search query
|
||||
SELECT id, title, ts_rank(search_vector, query) AS rank
|
||||
FROM articles, plainto_tsquery('english', 'database optimization') AS query
|
||||
WHERE search_vector @@ query
|
||||
ORDER BY rank DESC
|
||||
LIMIT 20;
|
||||
|
||||
-- GiST: geometry and range types
|
||||
CREATE INDEX idx_events_duration ON events USING GiST (
|
||||
tstzrange(starts_at, ends_at)
|
||||
);
|
||||
|
||||
-- Find overlapping events
|
||||
SELECT * FROM events
|
||||
WHERE tstzrange(starts_at, ends_at) && tstzrange('2025-06-01', '2025-06-02');
|
||||
|
||||
-- Partial index: only index rows you actually query
|
||||
CREATE INDEX idx_orders_pending ON orders(placed_at)
|
||||
WHERE status = 'pending';
|
||||
|
||||
-- Covering index: avoids heap lookup for common queries
|
||||
CREATE INDEX idx_users_email_covering ON users(email)
|
||||
INCLUDE (name, role);
|
||||
|
||||
-- This query can now be answered entirely from the index
|
||||
SELECT name, role FROM users WHERE email = 'user@example.com';
|
||||
```
|
||||
|
||||
**When to add an index:** Run `EXPLAIN ANALYZE` first. Add an index when you see sequential scans on large tables with selective WHERE clauses. Do not index columns with very low cardinality (e.g., a boolean on a small table) unless combined with other columns.
|
||||
|
||||
---
|
||||
|
||||
### 3. Query Optimization
|
||||
|
||||
#### Reading EXPLAIN ANALYZE
|
||||
|
||||
```sql
|
||||
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
|
||||
SELECT u.name, COUNT(o.id) AS order_count
|
||||
FROM users u
|
||||
JOIN orders o ON o.user_id = u.id
|
||||
WHERE o.placed_at > now() - INTERVAL '30 days'
|
||||
GROUP BY u.id, u.name
|
||||
ORDER BY order_count DESC
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
**What to look for in the output:**
|
||||
- **Seq Scan on large tables** -- add an index or rewrite the WHERE clause
|
||||
- **Nested Loop with high row counts** -- consider a Hash Join (may need more `work_mem`)
|
||||
- **actual rows far exceeding estimated rows** -- run `ANALYZE tablename` to update statistics
|
||||
- **Buffers: shared read** large numbers -- data not cached, check `shared_buffers` sizing
|
||||
- **Sort Method: external merge** -- increase `work_mem` for this query
|
||||
|
||||
#### Common Query Rewrites
|
||||
|
||||
```sql
|
||||
-- BAD: correlated subquery runs once per row
|
||||
SELECT u.name,
|
||||
(SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) AS order_count
|
||||
FROM users u;
|
||||
|
||||
-- GOOD: single pass with JOIN + GROUP BY
|
||||
SELECT u.name, COUNT(o.id) AS order_count
|
||||
FROM users u
|
||||
LEFT JOIN orders o ON o.user_id = u.id
|
||||
GROUP BY u.id, u.name;
|
||||
|
||||
-- BAD: OR on different columns defeats index usage
|
||||
SELECT * FROM orders WHERE user_id = 5 OR status = 'pending';
|
||||
|
||||
-- GOOD: UNION ALL lets each branch use its own index
|
||||
SELECT * FROM orders WHERE user_id = 5
|
||||
UNION ALL
|
||||
SELECT * FROM orders WHERE status = 'pending' AND user_id != 5;
|
||||
|
||||
-- BAD: function call on indexed column prevents index use
|
||||
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
|
||||
|
||||
-- GOOD: expression index or use citext
|
||||
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
|
||||
-- or better: define email as CITEXT type
|
||||
|
||||
-- Avoiding N+1: fetch users and their latest order in one query
|
||||
SELECT DISTINCT ON (u.id)
|
||||
u.id, u.name, o.id AS latest_order_id, o.total_cents, o.placed_at
|
||||
FROM users u
|
||||
LEFT JOIN orders o ON o.user_id = u.id
|
||||
ORDER BY u.id, o.placed_at DESC;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Migrations
|
||||
|
||||
Follow the up/down pattern and plan for zero-downtime deployments.
|
||||
|
||||
```sql
|
||||
-- ============================================
|
||||
-- Migration: 20250601_001_add_user_preferences
|
||||
-- ============================================
|
||||
|
||||
-- UP
|
||||
ALTER TABLE users ADD COLUMN preferences JSONB DEFAULT '{}';
|
||||
|
||||
-- Create index CONCURRENTLY to avoid locking the table
|
||||
CREATE INDEX CONCURRENTLY idx_users_preferences
|
||||
ON users USING GIN (preferences);
|
||||
|
||||
-- DOWN
|
||||
DROP INDEX IF EXISTS idx_users_preferences;
|
||||
ALTER TABLE users DROP COLUMN IF EXISTS preferences;
|
||||
```
|
||||
|
||||
**Safe vs unsafe operations:**
|
||||
|
||||
| Operation | Safe? | Notes |
|
||||
|-----------|-------|-------|
|
||||
| ADD COLUMN (nullable or with volatile default) | Yes | Instant in PG 11+ with non-volatile default too |
|
||||
| ADD COLUMN NOT NULL without default | No | Fails if rows exist; add nullable first, backfill, then set NOT NULL |
|
||||
| DROP COLUMN | Mostly | Quick, but ORM queries may break if they SELECT * |
|
||||
| RENAME COLUMN | Dangerous | Breaks all queries referencing old name; use a transition period |
|
||||
| ADD INDEX | Safe with CONCURRENTLY | Without CONCURRENTLY, locks writes for duration |
|
||||
| ADD CONSTRAINT (CHECK/FK) | Careful | Use NOT VALID then VALIDATE CONSTRAINT in two steps |
|
||||
| Change column type | Dangerous | Rewrites entire table; use a new column + migration instead |
|
||||
|
||||
```sql
|
||||
-- Zero-downtime: add NOT NULL constraint safely
|
||||
-- Step 1: add column as nullable
|
||||
ALTER TABLE users ADD COLUMN phone TEXT;
|
||||
|
||||
-- Step 2: backfill in batches
|
||||
UPDATE users SET phone = '' WHERE phone IS NULL AND id BETWEEN 1 AND 10000;
|
||||
UPDATE users SET phone = '' WHERE phone IS NULL AND id BETWEEN 10001 AND 20000;
|
||||
-- ... continue in batches
|
||||
|
||||
-- Step 3: add constraint without full table lock
|
||||
ALTER TABLE users ADD CONSTRAINT users_phone_not_null
|
||||
CHECK (phone IS NOT NULL) NOT VALID;
|
||||
|
||||
-- Step 4: validate (scans table but allows concurrent writes)
|
||||
ALTER TABLE users VALIDATE CONSTRAINT users_phone_not_null;
|
||||
|
||||
-- Step 5: optionally convert to proper NOT NULL
|
||||
ALTER TABLE users ALTER COLUMN phone SET NOT NULL;
|
||||
ALTER TABLE users DROP CONSTRAINT users_phone_not_null;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. JSON/JSONB
|
||||
|
||||
Use JSONB for semi-structured data that lives alongside relational columns.
|
||||
|
||||
**When to use JSONB:**
|
||||
- User preferences, settings, or metadata with varying keys
|
||||
- API response caching or event payloads
|
||||
- Flexible attributes that differ per row
|
||||
|
||||
**When NOT to use JSONB:**
|
||||
- Data you regularly JOIN on or use in WHERE clauses across tables -- normalize it
|
||||
- Data that has a fixed, well-known schema -- use proper columns
|
||||
|
||||
```sql
|
||||
-- Querying JSONB: operators
|
||||
-- -> returns JSONB element (keeps type)
|
||||
-- ->> returns TEXT value
|
||||
-- @> containment (left contains right)
|
||||
-- ? key exists
|
||||
|
||||
-- Get a nested value
|
||||
SELECT
|
||||
metadata->>'department' AS department,
|
||||
metadata->'settings'->>'theme' AS theme
|
||||
FROM users
|
||||
WHERE metadata @> '{"role": "admin"}';
|
||||
|
||||
-- Check if a key exists
|
||||
SELECT * FROM users WHERE metadata ? 'avatar_url';
|
||||
|
||||
-- Query inside JSONB arrays
|
||||
SELECT * FROM orders
|
||||
WHERE items @> '[{"sku": "WIDGET-001"}]';
|
||||
|
||||
-- Update a nested JSONB field
|
||||
UPDATE users
|
||||
SET metadata = jsonb_set(metadata, '{settings,notifications}', '"email"')
|
||||
WHERE id = 42;
|
||||
|
||||
-- Remove a key
|
||||
UPDATE users
|
||||
SET metadata = metadata - 'deprecated_field'
|
||||
WHERE metadata ? 'deprecated_field';
|
||||
|
||||
-- Aggregate JSONB: expand array elements into rows
|
||||
SELECT o.id, item->>'sku' AS sku, (item->>'qty')::int AS qty
|
||||
FROM orders o, jsonb_array_elements(o.items) AS item
|
||||
WHERE o.status = 'pending';
|
||||
|
||||
-- Index strategies for JSONB
|
||||
-- General containment queries: GIN with jsonb_ops (default)
|
||||
CREATE INDEX idx_users_metadata_gin ON users USING GIN (metadata);
|
||||
|
||||
-- Containment-only queries (smaller, faster index): jsonb_path_ops
|
||||
CREATE INDEX idx_orders_items_path ON orders USING GIN (items jsonb_path_ops);
|
||||
|
||||
-- Specific key lookups: expression index on extracted value
|
||||
CREATE INDEX idx_users_department ON users ((metadata->>'department'));
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. CTEs and Window Functions
|
||||
|
||||
#### Common Table Expressions (CTEs)
|
||||
|
||||
```sql
|
||||
-- Readable multi-step query with CTEs
|
||||
WITH monthly_revenue AS (
|
||||
SELECT
|
||||
date_trunc('month', placed_at) AS month,
|
||||
SUM(total_cents) AS revenue_cents
|
||||
FROM orders
|
||||
WHERE status = 'delivered'
|
||||
GROUP BY 1
|
||||
),
|
||||
revenue_with_growth AS (
|
||||
SELECT
|
||||
month,
|
||||
revenue_cents,
|
||||
LAG(revenue_cents) OVER (ORDER BY month) AS prev_month,
|
||||
ROUND(
|
||||
100.0 * (revenue_cents - LAG(revenue_cents) OVER (ORDER BY month))
|
||||
/ NULLIF(LAG(revenue_cents) OVER (ORDER BY month), 0),
|
||||
1
|
||||
) AS growth_pct
|
||||
FROM monthly_revenue
|
||||
)
|
||||
SELECT * FROM revenue_with_growth ORDER BY month DESC;
|
||||
|
||||
-- Recursive CTE: org hierarchy tree
|
||||
WITH RECURSIVE org_tree AS (
|
||||
-- Base case: top-level orgs
|
||||
SELECT id, name, parent_id, 0 AS depth, name::TEXT AS path
|
||||
FROM organizations
|
||||
WHERE parent_id IS NULL
|
||||
|
||||
UNION ALL
|
||||
|
||||
-- Recursive step
|
||||
SELECT o.id, o.name, o.parent_id, t.depth + 1, t.path || ' > ' || o.name
|
||||
FROM organizations o
|
||||
JOIN org_tree t ON o.parent_id = t.id
|
||||
)
|
||||
SELECT * FROM org_tree ORDER BY path;
|
||||
```
|
||||
|
||||
#### Window Functions
|
||||
|
||||
```sql
|
||||
-- ROW_NUMBER: assign rank within a partition
|
||||
SELECT
|
||||
user_id,
|
||||
id AS order_id,
|
||||
total_cents,
|
||||
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY placed_at DESC) AS rn
|
||||
FROM orders;
|
||||
|
||||
-- Get each user's most recent order
|
||||
SELECT * FROM (
|
||||
SELECT
|
||||
o.*,
|
||||
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY placed_at DESC) AS rn
|
||||
FROM orders o
|
||||
) sub WHERE rn = 1;
|
||||
|
||||
-- LAG/LEAD: compare with previous/next row
|
||||
SELECT
|
||||
placed_at::date AS order_date,
|
||||
total_cents,
|
||||
LAG(total_cents) OVER (ORDER BY placed_at) AS prev_order_total,
|
||||
total_cents - LAG(total_cents) OVER (ORDER BY placed_at) AS diff
|
||||
FROM orders
|
||||
WHERE user_id = 42;
|
||||
|
||||
-- Running total
|
||||
SELECT
|
||||
placed_at::date AS order_date,
|
||||
total_cents,
|
||||
SUM(total_cents) OVER (
|
||||
ORDER BY placed_at
|
||||
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
|
||||
) AS running_total
|
||||
FROM orders
|
||||
WHERE user_id = 42;
|
||||
|
||||
-- NTILE: divide rows into equal buckets (e.g., quartiles)
|
||||
SELECT
|
||||
user_id,
|
||||
SUM(total_cents) AS lifetime_spend,
|
||||
NTILE(4) OVER (ORDER BY SUM(total_cents) DESC) AS spend_quartile
|
||||
FROM orders
|
||||
GROUP BY user_id;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 7. Transaction Isolation
|
||||
|
||||
PostgreSQL supports four isolation levels. The two most commonly used:
|
||||
|
||||
| Level | Dirty Read | Non-Repeatable Read | Phantom Read | Use Case |
|
||||
|-------|-----------|-------------------|-------------|----------|
|
||||
| READ COMMITTED (default) | No | Possible | Possible | Most OLTP workloads |
|
||||
| REPEATABLE READ | No | No | No (in PG) | Reports, consistent snapshots |
|
||||
| SERIALIZABLE | No | No | No | Financial transactions, inventory |
|
||||
|
||||
```sql
|
||||
-- Default: READ COMMITTED
|
||||
-- Each statement sees the latest committed data
|
||||
BEGIN;
|
||||
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
|
||||
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
|
||||
COMMIT;
|
||||
|
||||
-- SERIALIZABLE: full isolation, detects write conflicts
|
||||
BEGIN ISOLATION LEVEL SERIALIZABLE;
|
||||
-- Read current inventory
|
||||
SELECT quantity FROM inventory WHERE sku = 'WIDGET-001';
|
||||
-- Decrement if sufficient (PG will abort if concurrent tx conflicts)
|
||||
UPDATE inventory SET quantity = quantity - 1 WHERE sku = 'WIDGET-001';
|
||||
COMMIT;
|
||||
-- If another SERIALIZABLE tx modified the same row, one will get:
|
||||
-- ERROR: could not serialize access due to concurrent update
|
||||
-- Your application must retry on serialization failure (SQLSTATE 40001)
|
||||
|
||||
-- Advisory locks for application-level coordination
|
||||
SELECT pg_advisory_xact_lock(hashtext('process-user-' || '42'));
|
||||
-- Lock is held until transaction ends; no table-level contention
|
||||
```
|
||||
|
||||
**Guidelines:**
|
||||
- Use READ COMMITTED for general CRUD operations
|
||||
- Use SERIALIZABLE when correctness requires that concurrent transactions behave as if run sequentially (e.g., balance transfers, seat reservations)
|
||||
- Always implement retry logic for serialization failures
|
||||
- Keep transactions as short as possible to reduce contention
|
||||
|
||||
---
|
||||
|
||||
### 8. Connection Pooling
|
||||
|
||||
Direct PostgreSQL connections are expensive (~1-10 MB RAM each). Use a pooler.
|
||||
|
||||
**PgBouncer configuration (pgbouncer.ini):**
|
||||
|
||||
```ini
|
||||
[databases]
|
||||
myapp = host=127.0.0.1 port=5432 dbname=myapp
|
||||
|
||||
[pgbouncer]
|
||||
listen_addr = 127.0.0.1
|
||||
listen_port = 6432
|
||||
auth_type = scram-sha-256
|
||||
auth_file = /etc/pgbouncer/userlist.txt
|
||||
|
||||
; Pool mode: transaction is best for most web apps
|
||||
pool_mode = transaction
|
||||
|
||||
; Sizing: start conservative, tune with monitoring
|
||||
default_pool_size = 20
|
||||
max_client_conn = 200
|
||||
min_pool_size = 5
|
||||
reserve_pool_size = 5
|
||||
reserve_pool_timeout = 3
|
||||
|
||||
; Timeouts
|
||||
server_idle_timeout = 300
|
||||
client_idle_timeout = 60
|
||||
query_timeout = 30
|
||||
```
|
||||
|
||||
**Pool sizing formula:**
|
||||
|
||||
```
|
||||
optimal_pool_size = ((2 * cpu_cores) + effective_disk_spindles)
|
||||
```
|
||||
|
||||
For a 4-core SSD server: `(2 * 4) + 1 = 9` connections is a good starting point. More connections does not mean more throughput -- too many causes contention.
|
||||
|
||||
**Pool modes:**
|
||||
|
||||
| Mode | Description | Caveats |
|
||||
|------|-------------|---------|
|
||||
| `transaction` | Connection returned after each transaction | Cannot use session-level features (LISTEN/NOTIFY, prepared statements, temp tables) |
|
||||
| `session` | Connection held for entire client session | Fewer pooling benefits; use only when session features needed |
|
||||
| `statement` | Connection returned after each statement | No multi-statement transactions; rarely used |
|
||||
|
||||
**Application-level pooling (Python example with asyncpg):**
|
||||
|
||||
```python
|
||||
import asyncpg
|
||||
|
||||
pool = await asyncpg.create_pool(
|
||||
dsn="postgresql://user:pass@localhost:6432/myapp",
|
||||
min_size=5,
|
||||
max_size=20,
|
||||
max_inactive_connection_lifetime=300,
|
||||
command_timeout=30,
|
||||
)
|
||||
|
||||
async with pool.acquire() as conn:
|
||||
rows = await conn.fetch("SELECT * FROM users WHERE active = true")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use parameterized queries everywhere.** Never concatenate user input into SQL strings. ORMs and query builders handle this, but verify in raw SQL contexts.
|
||||
|
||||
2. **Run ANALYZE after bulk data changes.** The query planner relies on statistics. After large imports or deletes, run `ANALYZE tablename` to update them.
|
||||
|
||||
3. **Prefer BIGINT for primary keys.** INTEGER (max ~2.1 billion) can be exhausted sooner than expected in high-write systems. BIGINT costs 4 extra bytes per row but avoids a painful migration later.
|
||||
|
||||
4. **Store money as integers (cents).** Floating-point arithmetic causes rounding errors. Use `BIGINT` for cents or `NUMERIC(19,4)` if sub-cent precision is needed.
|
||||
|
||||
5. **Add indexes for foreign keys.** PostgreSQL does not automatically index the child side of a foreign key. Without it, DELETE on the parent table triggers a sequential scan on the child.
|
||||
|
||||
6. **Use TIMESTAMPTZ, not TIMESTAMP.** `TIMESTAMP WITHOUT TIME ZONE` silently drops timezone info. Always use `TIMESTAMPTZ` and let the application control display timezone.
|
||||
|
||||
7. **Set statement_timeout for web requests.** Prevent runaway queries from holding connections: `SET statement_timeout = '5s';` at session start, or configure per-role in PostgreSQL.
|
||||
|
||||
8. **Monitor with pg_stat_statements.** Enable this extension to track query performance over time. The top queries by `total_exec_time` are your optimization targets.
|
||||
|
||||
```sql
|
||||
-- Find slowest queries
|
||||
SELECT
|
||||
calls,
|
||||
round(total_exec_time::numeric, 1) AS total_ms,
|
||||
round(mean_exec_time::numeric, 1) AS mean_ms,
|
||||
query
|
||||
FROM pg_stat_statements
|
||||
ORDER BY total_exec_time DESC
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **N+1 queries from ORM lazy loading.** Loading a list of users and then accessing `user.orders` in a loop generates one query per user. Use eager loading (`joinedload` in SQLAlchemy, `select_related` in Django) or batch the query with a JOIN.
|
||||
|
||||
2. **Locking the table during migrations.** `ALTER TABLE ... ADD COLUMN NOT NULL DEFAULT 'x'` is safe in PG 11+, but `CREATE INDEX` without `CONCURRENTLY` locks writes. Always use `CREATE INDEX CONCURRENTLY` in production migrations.
|
||||
|
||||
3. **Bloated tables from UPDATE-heavy workloads.** PostgreSQL MVCC creates dead tuples on every UPDATE. If autovacuum cannot keep up, table size and query times grow. Monitor `pg_stat_user_tables.n_dead_tup` and tune autovacuum settings for hot tables.
|
||||
|
||||
4. **Using OFFSET for pagination on large datasets.** `OFFSET 100000` forces PG to scan and discard 100,000 rows. Use keyset pagination instead:
|
||||
|
||||
```sql
|
||||
-- BAD: slow for deep pages
|
||||
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;
|
||||
|
||||
-- GOOD: keyset pagination
|
||||
SELECT * FROM orders WHERE id > 100000 ORDER BY id LIMIT 20;
|
||||
```
|
||||
|
||||
5. **Ignoring connection limits.** Each PostgreSQL connection consumes RAM. Opening hundreds of direct connections (e.g., one per serverless function invocation) will exhaust `max_connections` and crash the server. Always use PgBouncer or an application-level pool.
|
||||
|
||||
6. **Storing large blobs in the database.** Files over a few KB should go in object storage (S3, R2). Store the URL/key in PostgreSQL. Large `bytea` or `TEXT` columns bloat the table, slow backups, and waste shared_buffers cache.
|
||||
|
||||
## Related Skills
|
||||
|
||||
- `mongodb` - Document-based database patterns for non-relational data
|
||||
- `caching` - Caching strategies to reduce database load
|
||||
- `logging` - Logging patterns for query debugging and monitoring
|
||||
@@ -0,0 +1,279 @@
|
||||
# Databases — Redis Patterns
|
||||
|
||||
|
||||
# Redis
|
||||
|
||||
## When to Use
|
||||
|
||||
- Caching database queries or API responses
|
||||
- Session storage for web applications
|
||||
- Rate limiting (distributed across instances)
|
||||
- Job/task queues (BullMQ, Celery)
|
||||
- Pub/sub messaging between services
|
||||
- Distributed locks
|
||||
|
||||
## When NOT to Use
|
||||
|
||||
- **Primary data storage** — Redis is a cache/broker, not a database of record
|
||||
- **Complex queries** — use PostgreSQL for relational queries
|
||||
- **Large blobs** — use S3/R2 for file storage
|
||||
- **In-memory caching only** — use `functools.lru_cache` or `Map` for single-process caches
|
||||
|
||||
---
|
||||
|
||||
## Python (redis-py / FastAPI)
|
||||
|
||||
### Connection
|
||||
|
||||
```python
|
||||
# src/core/redis.py
|
||||
import redis.asyncio as redis
|
||||
|
||||
pool = redis.ConnectionPool.from_url(
|
||||
"redis://localhost:6379/0",
|
||||
max_connections=20,
|
||||
decode_responses=True,
|
||||
)
|
||||
|
||||
async def get_redis() -> redis.Redis:
|
||||
return redis.Redis(connection_pool=pool)
|
||||
```
|
||||
|
||||
### Cache-aside pattern
|
||||
|
||||
```python
|
||||
import json
|
||||
from datetime import timedelta
|
||||
|
||||
async def get_user_cached(user_id: str, db: AsyncSession) -> User:
|
||||
r = await get_redis()
|
||||
cache_key = f"user:{user_id}"
|
||||
|
||||
# Check cache
|
||||
cached = await r.get(cache_key)
|
||||
if cached:
|
||||
return User(**json.loads(cached))
|
||||
|
||||
# Cache miss — fetch from DB
|
||||
user = await db.get(User, user_id)
|
||||
if not user:
|
||||
raise HTTPException(status_code=404, detail="User not found")
|
||||
|
||||
# Store in cache with TTL
|
||||
await r.setex(cache_key, timedelta(minutes=15), json.dumps(user.to_dict()))
|
||||
return user
|
||||
```
|
||||
|
||||
### Cache invalidation
|
||||
|
||||
```python
|
||||
async def update_user(user_id: str, data: UpdateUserRequest, db: AsyncSession) -> User:
|
||||
user = await db.get(User, user_id)
|
||||
for key, value in data.dict(exclude_unset=True).items():
|
||||
setattr(user, key, value)
|
||||
await db.commit()
|
||||
|
||||
# Invalidate cache
|
||||
r = await get_redis()
|
||||
await r.delete(f"user:{user_id}")
|
||||
|
||||
return user
|
||||
```
|
||||
|
||||
### Rate limiting
|
||||
|
||||
```python
|
||||
from fastapi import Request, HTTPException
|
||||
|
||||
async def rate_limit(request: Request, limit: int = 100, window: int = 900):
|
||||
r = await get_redis()
|
||||
key = f"rate:{request.client.host}"
|
||||
current = await r.incr(key)
|
||||
if current == 1:
|
||||
await r.expire(key, window)
|
||||
if current > limit:
|
||||
raise HTTPException(status_code=429, detail="Rate limit exceeded")
|
||||
```
|
||||
|
||||
### Session storage
|
||||
|
||||
```python
|
||||
import secrets
|
||||
|
||||
async def create_session(user_id: str) -> str:
|
||||
r = await get_redis()
|
||||
session_id = secrets.token_urlsafe(32)
|
||||
await r.setex(f"session:{session_id}", timedelta(hours=24), user_id)
|
||||
return session_id
|
||||
|
||||
async def get_session(session_id: str) -> str | None:
|
||||
r = await get_redis()
|
||||
return await r.get(f"session:{session_id}")
|
||||
|
||||
async def delete_session(session_id: str):
|
||||
r = await get_redis()
|
||||
await r.delete(f"session:{session_id}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## TypeScript (ioredis / NestJS / Express)
|
||||
|
||||
### Connection
|
||||
|
||||
```typescript
|
||||
// src/core/redis.ts
|
||||
import Redis from 'ioredis';
|
||||
|
||||
export const redis = new Redis(process.env.REDIS_URL ?? 'redis://localhost:6379', {
|
||||
maxRetriesPerRequest: 3,
|
||||
lazyConnect: true,
|
||||
});
|
||||
```
|
||||
|
||||
### NestJS module
|
||||
|
||||
```typescript
|
||||
// src/cache/cache.module.ts
|
||||
import { Global, Module } from '@nestjs/common';
|
||||
import { CacheService } from './cache.service';
|
||||
|
||||
@Global()
|
||||
@Module({
|
||||
providers: [CacheService],
|
||||
exports: [CacheService],
|
||||
})
|
||||
export class CacheModule {}
|
||||
```
|
||||
|
||||
```typescript
|
||||
// src/cache/cache.service.ts
|
||||
import { Injectable, OnModuleDestroy } from '@nestjs/common';
|
||||
import Redis from 'ioredis';
|
||||
|
||||
@Injectable()
|
||||
export class CacheService implements OnModuleDestroy {
|
||||
private readonly redis = new Redis(process.env.REDIS_URL!);
|
||||
|
||||
async get<T>(key: string): Promise<T | null> {
|
||||
const data = await this.redis.get(key);
|
||||
return data ? JSON.parse(data) : null;
|
||||
}
|
||||
|
||||
async set(key: string, value: unknown, ttlSeconds: number): Promise<void> {
|
||||
await this.redis.setex(key, ttlSeconds, JSON.stringify(value));
|
||||
}
|
||||
|
||||
async del(key: string): Promise<void> {
|
||||
await this.redis.del(key);
|
||||
}
|
||||
|
||||
async onModuleDestroy() {
|
||||
await this.redis.quit();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Cache-aside in service
|
||||
|
||||
```typescript
|
||||
@Injectable()
|
||||
export class UsersService {
|
||||
constructor(
|
||||
private readonly prisma: PrismaService,
|
||||
private readonly cache: CacheService,
|
||||
) {}
|
||||
|
||||
async findOne(id: string): Promise<User> {
|
||||
// Check cache
|
||||
const cached = await this.cache.get<User>(`user:${id}`);
|
||||
if (cached) return cached;
|
||||
|
||||
// Cache miss
|
||||
const user = await this.prisma.user.findUnique({ where: { id } });
|
||||
if (!user) throw new NotFoundException(`User ${id} not found`);
|
||||
|
||||
// Store with 15min TTL
|
||||
await this.cache.set(`user:${id}`, user, 900);
|
||||
return user;
|
||||
}
|
||||
|
||||
async update(id: string, dto: UpdateUserDto): Promise<User> {
|
||||
const user = await this.prisma.user.update({ where: { id }, data: dto });
|
||||
await this.cache.del(`user:${id}`); // Invalidate
|
||||
return user;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pub/Sub
|
||||
|
||||
### Python
|
||||
|
||||
```python
|
||||
# Publisher
|
||||
async def publish_event(channel: str, event: dict):
|
||||
r = await get_redis()
|
||||
await r.publish(channel, json.dumps(event))
|
||||
|
||||
# Subscriber
|
||||
async def subscribe_events(channel: str):
|
||||
r = await get_redis()
|
||||
pubsub = r.pubsub()
|
||||
await pubsub.subscribe(channel)
|
||||
async for message in pubsub.listen():
|
||||
if message['type'] == 'message':
|
||||
yield json.loads(message['data'])
|
||||
```
|
||||
|
||||
### TypeScript
|
||||
|
||||
```typescript
|
||||
// Publisher
|
||||
const pub = new Redis(process.env.REDIS_URL!);
|
||||
await pub.publish('orders', JSON.stringify({ type: 'created', orderId: '123' }));
|
||||
|
||||
// Subscriber (separate connection required)
|
||||
const sub = new Redis(process.env.REDIS_URL!);
|
||||
sub.subscribe('orders');
|
||||
sub.on('message', (channel, message) => {
|
||||
const event = JSON.parse(message);
|
||||
console.log(`[${channel}]`, event);
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Naming Conventions
|
||||
|
||||
```
|
||||
entity:id → user:abc123
|
||||
entity:id:field → user:abc123:orders
|
||||
rate:ip → rate:192.168.1.1
|
||||
session:token → session:abc123def
|
||||
lock:resource → lock:order-processing
|
||||
queue:name → queue:email-notifications
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **Not setting TTLs.** Every cache key should have an expiration. Unbounded caches exhaust memory.
|
||||
2. **Cache stampede.** When a popular key expires, many requests hit the DB simultaneously. Use distributed locks or stale-while-revalidate.
|
||||
3. **Using the same connection for pub/sub.** Subscribers can't run other commands. Use a dedicated connection.
|
||||
4. **Storing large objects.** Redis is fast for small values. Keep values under 1MB; for larger data, store a pointer to S3.
|
||||
5. **Not handling connection failures.** Redis connections drop. Use retry logic and connection pools.
|
||||
6. **Forgetting to invalidate.** When data changes, delete the cache key. Stale cache is worse than no cache.
|
||||
|
||||
---
|
||||
|
||||
## Related Skills
|
||||
|
||||
- `caching` — HTTP caching, CDN, memoization (framework-agnostic patterns)
|
||||
- `background-jobs` — BullMQ/Celery use Redis as broker
|
||||
- `fastapi` — Redis integration with FastAPI dependency injection
|
||||
- `nestjs` — Redis caching module in NestJS
|
||||
- `docker` — Running Redis in Docker Compose for development
|
||||
Reference in New Issue
Block a user