creator/claudekit

Fork 0

mirror of https://github.com/duthaho/claudekit.git synced 2026-06-10 20:24:57 +03:00

Files

T

duthaho d1a6d2a2bc feat: improved the Claude Kit as a plugin

2026-04-19 14:10:38 +07:00

20 KiB

Raw Blame History

Databases — PostgreSQL Patterns

PostgreSQL

When to Use

PostgreSQL database operations
SQL query optimization
Schema design and migrations
JSONB document storage within a relational model
Full-text search without a dedicated search engine
Complex analytical queries with window functions and CTEs

When NOT to Use

NoSQL-only projects where no relational database is involved
In-memory databases like Redis or SQLite used purely for caching or ephemeral storage
File-based storage scenarios that do not require a database engine

Core Patterns

1. Schema Design

Design tables with explicit constraints, proper types, and clear relationships.

-- Enums for constrained value sets
CREATE TYPE user_role AS ENUM ('admin', 'editor', 'viewer');
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered', 'cancelled');

-- Composite types for reusable structures
CREATE TYPE address AS (
    street TEXT,
    city   TEXT,
    state  TEXT,
    zip    VARCHAR(10)
);

-- Users table with constraints
CREATE TABLE users (
    id          BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email       TEXT NOT NULL UNIQUE,
    name        TEXT NOT NULL CHECK (char_length(name) >= 1),
    role        user_role NOT NULL DEFAULT 'viewer',
    metadata    JSONB DEFAULT '{}',
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Organizations with self-referencing hierarchy
CREATE TABLE organizations (
    id          BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name        TEXT NOT NULL,
    parent_id   BIGINT REFERENCES organizations(id) ON DELETE SET NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Membership join table with composite primary key
CREATE TABLE org_memberships (
    user_id     BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    org_id      BIGINT NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    role        user_role NOT NULL DEFAULT 'viewer',
    joined_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
    PRIMARY KEY (user_id, org_id)
);

-- Orders with foreign keys, check constraints, and enum status
CREATE TABLE orders (
    id          BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_id     BIGINT NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
    status      order_status NOT NULL DEFAULT 'pending',
    total_cents BIGINT NOT NULL CHECK (total_cents >= 0),
    shipping    address,
    items       JSONB NOT NULL DEFAULT '[]',
    placed_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Auto-update updated_at with a trigger
CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = now();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_users_updated_at
    BEFORE UPDATE ON users
    FOR EACH ROW EXECUTE FUNCTION set_updated_at();

Key principles:

Use BIGINT GENERATED ALWAYS AS IDENTITY over SERIAL for new projects
Use TIMESTAMPTZ (not TIMESTAMP) to store times with timezone awareness
Prefer TEXT over VARCHAR(n) unless a hard length limit is business-critical
Add ON DELETE actions on every foreign key (CASCADE, RESTRICT, or SET NULL)
Use CHECK constraints for business rules that live at the data level

2. Index Strategy

Choose the right index type based on your query patterns.

Decision guide:

Query Pattern	Index Type	Example
Equality (`=`) and range (`<`, `>`, `BETWEEN`)	B-tree (default)	`WHERE created_at > '2025-01-01'`
Array containment (`@>`), JSONB queries	GIN	`WHERE tags @> '{postgres}'`
Full-text search (`@@`)	GIN	`WHERE to_tsvector(body) @@ query`
Geometry, range overlap	GiST	`WHERE location <-> point '(40.7,-74.0)' < 0.01`
Filtered subset of rows	Partial	`WHERE active = true`
Index-only scans (no heap lookup)	Covering (INCLUDE)	Frequently selected columns

-- B-tree: default, good for equality and range
CREATE INDEX idx_orders_placed_at ON orders(placed_at DESC);
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- GIN: arrays and JSONB containment
CREATE INDEX idx_users_metadata ON users USING GIN (metadata);
CREATE INDEX idx_orders_items ON orders USING GIN (items jsonb_path_ops);

-- GIN: full-text search
ALTER TABLE articles ADD COLUMN search_vector tsvector
    GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(body, '')), 'B')
    ) STORED;

CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

-- Full-text search query
SELECT id, title, ts_rank(search_vector, query) AS rank
FROM articles, plainto_tsquery('english', 'database optimization') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

-- GiST: geometry and range types
CREATE INDEX idx_events_duration ON events USING GiST (
    tstzrange(starts_at, ends_at)
);

-- Find overlapping events
SELECT * FROM events
WHERE tstzrange(starts_at, ends_at) && tstzrange('2025-06-01', '2025-06-02');

-- Partial index: only index rows you actually query
CREATE INDEX idx_orders_pending ON orders(placed_at)
    WHERE status = 'pending';

-- Covering index: avoids heap lookup for common queries
CREATE INDEX idx_users_email_covering ON users(email)
    INCLUDE (name, role);

-- This query can now be answered entirely from the index
SELECT name, role FROM users WHERE email = 'user@example.com';

When to add an index: Run EXPLAIN ANALYZE first. Add an index when you see sequential scans on large tables with selective WHERE clauses. Do not index columns with very low cardinality (e.g., a boolean on a small table) unless combined with other columns.

3. Query Optimization

Reading EXPLAIN ANALYZE

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.placed_at > now() - INTERVAL '30 days'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;

What to look for in the output:

Seq Scan on large tables -- add an index or rewrite the WHERE clause
Nested Loop with high row counts -- consider a Hash Join (may need more work_mem)
actual rows far exceeding estimated rows -- run ANALYZE tablename to update statistics
Buffers: shared read large numbers -- data not cached, check shared_buffers sizing
Sort Method: external merge -- increase work_mem for this query

Common Query Rewrites

-- BAD: correlated subquery runs once per row
SELECT u.name,
    (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) AS order_count
FROM users u;

-- GOOD: single pass with JOIN + GROUP BY
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;

-- BAD: OR on different columns defeats index usage
SELECT * FROM orders WHERE user_id = 5 OR status = 'pending';

-- GOOD: UNION ALL lets each branch use its own index
SELECT * FROM orders WHERE user_id = 5
UNION ALL
SELECT * FROM orders WHERE status = 'pending' AND user_id != 5;

-- BAD: function call on indexed column prevents index use
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- GOOD: expression index or use citext
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- or better: define email as CITEXT type

-- Avoiding N+1: fetch users and their latest order in one query
SELECT DISTINCT ON (u.id)
    u.id, u.name, o.id AS latest_order_id, o.total_cents, o.placed_at
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
ORDER BY u.id, o.placed_at DESC;

4. Migrations

Follow the up/down pattern and plan for zero-downtime deployments.

-- ============================================
-- Migration: 20250601_001_add_user_preferences
-- ============================================

-- UP
ALTER TABLE users ADD COLUMN preferences JSONB DEFAULT '{}';

-- Create index CONCURRENTLY to avoid locking the table
CREATE INDEX CONCURRENTLY idx_users_preferences
    ON users USING GIN (preferences);

-- DOWN
DROP INDEX IF EXISTS idx_users_preferences;
ALTER TABLE users DROP COLUMN IF EXISTS preferences;

Safe vs unsafe operations:

Operation	Safe?	Notes
ADD COLUMN (nullable or with volatile default)	Yes	Instant in PG 11+ with non-volatile default too
ADD COLUMN NOT NULL without default	No	Fails if rows exist; add nullable first, backfill, then set NOT NULL
DROP COLUMN	Mostly	Quick, but ORM queries may break if they SELECT *
RENAME COLUMN	Dangerous	Breaks all queries referencing old name; use a transition period
ADD INDEX	Safe with CONCURRENTLY	Without CONCURRENTLY, locks writes for duration
ADD CONSTRAINT (CHECK/FK)	Careful	Use NOT VALID then VALIDATE CONSTRAINT in two steps
Change column type	Dangerous	Rewrites entire table; use a new column + migration instead

-- Zero-downtime: add NOT NULL constraint safely
-- Step 1: add column as nullable
ALTER TABLE users ADD COLUMN phone TEXT;

-- Step 2: backfill in batches
UPDATE users SET phone = '' WHERE phone IS NULL AND id BETWEEN 1 AND 10000;
UPDATE users SET phone = '' WHERE phone IS NULL AND id BETWEEN 10001 AND 20000;
-- ... continue in batches

-- Step 3: add constraint without full table lock
ALTER TABLE users ADD CONSTRAINT users_phone_not_null
    CHECK (phone IS NOT NULL) NOT VALID;

-- Step 4: validate (scans table but allows concurrent writes)
ALTER TABLE users VALIDATE CONSTRAINT users_phone_not_null;

-- Step 5: optionally convert to proper NOT NULL
ALTER TABLE users ALTER COLUMN phone SET NOT NULL;
ALTER TABLE users DROP CONSTRAINT users_phone_not_null;

5. JSON/JSONB

Use JSONB for semi-structured data that lives alongside relational columns.

When to use JSONB:

User preferences, settings, or metadata with varying keys
API response caching or event payloads
Flexible attributes that differ per row

When NOT to use JSONB:

Data you regularly JOIN on or use in WHERE clauses across tables -- normalize it
Data that has a fixed, well-known schema -- use proper columns

-- Querying JSONB: operators
-- ->  returns JSONB element (keeps type)
-- ->> returns TEXT value
-- @>  containment (left contains right)
-- ?   key exists

-- Get a nested value
SELECT
    metadata->>'department' AS department,
    metadata->'settings'->>'theme' AS theme
FROM users
WHERE metadata @> '{"role": "admin"}';

-- Check if a key exists
SELECT * FROM users WHERE metadata ? 'avatar_url';

-- Query inside JSONB arrays
SELECT * FROM orders
WHERE items @> '[{"sku": "WIDGET-001"}]';

-- Update a nested JSONB field
UPDATE users
SET metadata = jsonb_set(metadata, '{settings,notifications}', '"email"')
WHERE id = 42;

-- Remove a key
UPDATE users
SET metadata = metadata - 'deprecated_field'
WHERE metadata ? 'deprecated_field';

-- Aggregate JSONB: expand array elements into rows
SELECT o.id, item->>'sku' AS sku, (item->>'qty')::int AS qty
FROM orders o, jsonb_array_elements(o.items) AS item
WHERE o.status = 'pending';

-- Index strategies for JSONB
-- General containment queries: GIN with jsonb_ops (default)
CREATE INDEX idx_users_metadata_gin ON users USING GIN (metadata);

-- Containment-only queries (smaller, faster index): jsonb_path_ops
CREATE INDEX idx_orders_items_path ON orders USING GIN (items jsonb_path_ops);

-- Specific key lookups: expression index on extracted value
CREATE INDEX idx_users_department ON users ((metadata->>'department'));

6. CTEs and Window Functions

Common Table Expressions (CTEs)

-- Readable multi-step query with CTEs
WITH monthly_revenue AS (
    SELECT
        date_trunc('month', placed_at) AS month,
        SUM(total_cents) AS revenue_cents
    FROM orders
    WHERE status = 'delivered'
    GROUP BY 1
),
revenue_with_growth AS (
    SELECT
        month,
        revenue_cents,
        LAG(revenue_cents) OVER (ORDER BY month) AS prev_month,
        ROUND(
            100.0 * (revenue_cents - LAG(revenue_cents) OVER (ORDER BY month))
            / NULLIF(LAG(revenue_cents) OVER (ORDER BY month), 0),
            1
        ) AS growth_pct
    FROM monthly_revenue
)
SELECT * FROM revenue_with_growth ORDER BY month DESC;

-- Recursive CTE: org hierarchy tree
WITH RECURSIVE org_tree AS (
    -- Base case: top-level orgs
    SELECT id, name, parent_id, 0 AS depth, name::TEXT AS path
    FROM organizations
    WHERE parent_id IS NULL

    UNION ALL

    -- Recursive step
    SELECT o.id, o.name, o.parent_id, t.depth + 1, t.path || ' > ' || o.name
    FROM organizations o
    JOIN org_tree t ON o.parent_id = t.id
)
SELECT * FROM org_tree ORDER BY path;

Window Functions

-- ROW_NUMBER: assign rank within a partition
SELECT
    user_id,
    id AS order_id,
    total_cents,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY placed_at DESC) AS rn
FROM orders;

-- Get each user's most recent order
SELECT * FROM (
    SELECT
        o.*,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY placed_at DESC) AS rn
    FROM orders o
) sub WHERE rn = 1;

-- LAG/LEAD: compare with previous/next row
SELECT
    placed_at::date AS order_date,
    total_cents,
    LAG(total_cents) OVER (ORDER BY placed_at) AS prev_order_total,
    total_cents - LAG(total_cents) OVER (ORDER BY placed_at) AS diff
FROM orders
WHERE user_id = 42;

-- Running total
SELECT
    placed_at::date AS order_date,
    total_cents,
    SUM(total_cents) OVER (
        ORDER BY placed_at
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM orders
WHERE user_id = 42;

-- NTILE: divide rows into equal buckets (e.g., quartiles)
SELECT
    user_id,
    SUM(total_cents) AS lifetime_spend,
    NTILE(4) OVER (ORDER BY SUM(total_cents) DESC) AS spend_quartile
FROM orders
GROUP BY user_id;

7. Transaction Isolation

PostgreSQL supports four isolation levels. The two most commonly used:

Level	Dirty Read	Non-Repeatable Read	Phantom Read	Use Case
READ COMMITTED (default)	No	Possible	Possible	Most OLTP workloads
REPEATABLE READ	No	No	No (in PG)	Reports, consistent snapshots
SERIALIZABLE	No	No	No	Financial transactions, inventory

-- Default: READ COMMITTED
-- Each statement sees the latest committed data
BEGIN;
    UPDATE accounts SET balance = balance - 100 WHERE id = 1;
    UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

-- SERIALIZABLE: full isolation, detects write conflicts
BEGIN ISOLATION LEVEL SERIALIZABLE;
    -- Read current inventory
    SELECT quantity FROM inventory WHERE sku = 'WIDGET-001';
    -- Decrement if sufficient (PG will abort if concurrent tx conflicts)
    UPDATE inventory SET quantity = quantity - 1 WHERE sku = 'WIDGET-001';
COMMIT;
-- If another SERIALIZABLE tx modified the same row, one will get:
-- ERROR: could not serialize access due to concurrent update
-- Your application must retry on serialization failure (SQLSTATE 40001)

-- Advisory locks for application-level coordination
SELECT pg_advisory_xact_lock(hashtext('process-user-' || '42'));
-- Lock is held until transaction ends; no table-level contention

Guidelines:

Use READ COMMITTED for general CRUD operations
Use SERIALIZABLE when correctness requires that concurrent transactions behave as if run sequentially (e.g., balance transfers, seat reservations)
Always implement retry logic for serialization failures
Keep transactions as short as possible to reduce contention

8. Connection Pooling

Direct PostgreSQL connections are expensive (~1-10 MB RAM each). Use a pooler.

PgBouncer configuration (pgbouncer.ini):

[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt

; Pool mode: transaction is best for most web apps
pool_mode = transaction

; Sizing: start conservative, tune with monitoring
default_pool_size = 20
max_client_conn = 200
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3

; Timeouts
server_idle_timeout = 300
client_idle_timeout = 60
query_timeout = 30

Pool sizing formula:

optimal_pool_size = ((2 * cpu_cores) + effective_disk_spindles)

For a 4-core SSD server: (2 * 4) + 1 = 9 connections is a good starting point. More connections does not mean more throughput -- too many causes contention.

Pool modes:

Mode	Description	Caveats
`transaction`	Connection returned after each transaction	Cannot use session-level features (LISTEN/NOTIFY, prepared statements, temp tables)
`session`	Connection held for entire client session	Fewer pooling benefits; use only when session features needed
`statement`	Connection returned after each statement	No multi-statement transactions; rarely used

Application-level pooling (Python example with asyncpg):

import asyncpg

pool = await asyncpg.create_pool(
    dsn="postgresql://user:pass@localhost:6432/myapp",
    min_size=5,
    max_size=20,
    max_inactive_connection_lifetime=300,
    command_timeout=30,
)

async with pool.acquire() as conn:
    rows = await conn.fetch("SELECT * FROM users WHERE active = true")

Best Practices

Use parameterized queries everywhere. Never concatenate user input into SQL strings. ORMs and query builders handle this, but verify in raw SQL contexts.
Run ANALYZE after bulk data changes. The query planner relies on statistics. After large imports or deletes, run ANALYZE tablename to update them.
Prefer BIGINT for primary keys. INTEGER (max ~2.1 billion) can be exhausted sooner than expected in high-write systems. BIGINT costs 4 extra bytes per row but avoids a painful migration later.
Store money as integers (cents). Floating-point arithmetic causes rounding errors. Use BIGINT for cents or NUMERIC(19,4) if sub-cent precision is needed.
Add indexes for foreign keys. PostgreSQL does not automatically index the child side of a foreign key. Without it, DELETE on the parent table triggers a sequential scan on the child.
Use TIMESTAMPTZ, not TIMESTAMP. TIMESTAMP WITHOUT TIME ZONE silently drops timezone info. Always use TIMESTAMPTZ and let the application control display timezone.
Set statement_timeout for web requests. Prevent runaway queries from holding connections: SET statement_timeout = '5s'; at session start, or configure per-role in PostgreSQL.
Monitor with pg_stat_statements. Enable this extension to track query performance over time. The top queries by total_exec_time are your optimization targets.

-- Find slowest queries
SELECT
    calls,
    round(total_exec_time::numeric, 1) AS total_ms,
    round(mean_exec_time::numeric, 1) AS mean_ms,
    query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Common Pitfalls

N+1 queries from ORM lazy loading. Loading a list of users and then accessing user.orders in a loop generates one query per user. Use eager loading (joinedload in SQLAlchemy, select_related in Django) or batch the query with a JOIN.
Locking the table during migrations. ALTER TABLE ... ADD COLUMN NOT NULL DEFAULT 'x' is safe in PG 11+, but CREATE INDEX without CONCURRENTLY locks writes. Always use CREATE INDEX CONCURRENTLY in production migrations.
Bloated tables from UPDATE-heavy workloads. PostgreSQL MVCC creates dead tuples on every UPDATE. If autovacuum cannot keep up, table size and query times grow. Monitor pg_stat_user_tables.n_dead_tup and tune autovacuum settings for hot tables.
Using OFFSET for pagination on large datasets. OFFSET 100000 forces PG to scan and discard 100,000 rows. Use keyset pagination instead:

-- BAD: slow for deep pages
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;

-- GOOD: keyset pagination
SELECT * FROM orders WHERE id > 100000 ORDER BY id LIMIT 20;

Ignoring connection limits. Each PostgreSQL connection consumes RAM. Opening hundreds of direct connections (e.g., one per serverless function invocation) will exhaust max_connections and crash the server. Always use PgBouncer or an application-level pool.
Storing large blobs in the database. Files over a few KB should go in object storage (S3, R2). Store the URL/key in PostgreSQL. Large bytea or TEXT columns bloat the table, slow backups, and waste shared_buffers cache.

mongodb - Document-based database patterns for non-relational data
caching - Caching strategies to reduce database load
logging - Logging patterns for query debugging and monitoring

20 KiB Raw Blame History