Files
claudekit/.claude/skills/databases/postgresql/references/index-decision-tree.md
T

4.8 KiB

PostgreSQL Index Decision Tree

Quick reference for choosing the right index type.

Decision Tree

What are you querying?
|
+-- Equality (=) or Range (<, >, BETWEEN, ORDER BY)?
|   |
|   +-- On a single scalar column?
|   |   --> B-tree (default)
|   |
|   +-- On a timestamp/date column with append-only inserts?
|   |   --> BRIN (much smaller than B-tree)
|   |
|   +-- Need the index to also return columns without table lookup?
|       --> Covering Index (B-tree with INCLUDE)
|
+-- Array containment (@>, &&) or JSONB queries?
|   --> GIN
|
+-- Full-text search (tsvector, @@)?
|   --> GIN
|
+-- Geometric/spatial data (points, polygons, PostGIS)?
|   --> GiST
|
+-- Range types (int4range, tsrange, overlaps)?
|   --> GiST
|
+-- Nearest-neighbor / distance queries (KNN)?
|   --> GiST (or SP-GiST for partitioned space)
|
+-- Only a subset of rows match your WHERE clause?
|   --> Partial Index (any type + WHERE filter)
|
+-- Trigram similarity (LIKE '%pattern%', pg_trgm)?
|   --> GIN with pg_trgm  (or GiST for smaller, slower)
|
+-- Hash equality only (= but never range)?
    --> Hash index (rarely better than B-tree in practice)

Index Type Comparison

Type Best For Operators Size Write Cost Notes
B-tree Equality, range, sorting = < > <= >= BETWEEN IN IS NULL Medium Low Default. Covers 90% of cases.
GIN Multi-valued data `@> && @@ ? ?& ? ` Large High (slow updates)
GiST Spatial, ranges, nearest-neighbor << >> && @> <@ <-> Medium Medium Lossy for some types. Supports KNN.
SP-GiST Partitioned search spaces Same as GiST Medium Medium Good for phone numbers, IP addresses, non-balanced trees.
BRIN Large sequential/append-only tables = < > <= >= Tiny Very Low 1000x smaller than B-tree. Only effective when physical order correlates with column values.
Hash Equality only = Medium Low WAL-logged since PG10. Rarely outperforms B-tree.

Common Patterns

Covering Index (Index-Only Scans)

Avoid heap lookups by including extra columns:

-- Query: SELECT email, name FROM users WHERE email = ?
CREATE INDEX idx_users_email_covering
    ON users (email) INCLUDE (name);

Partial Index (Filtered)

Index only the rows you actually query:

-- Only index active orders (skip 95% of rows)
CREATE INDEX idx_orders_active
    ON orders (created_at)
    WHERE status = 'active';

Composite Index (Multi-Column)

Column order matters -- put equality columns first, range columns last:

-- Query: WHERE tenant_id = ? AND created_at > ?
CREATE INDEX idx_events_tenant_date
    ON events (tenant_id, created_at);

Expression Index

Index a computed value:

CREATE INDEX idx_users_lower_email
    ON users (lower(email));

GIN for JSONB

-- Index all keys and values in a JSONB column
CREATE INDEX idx_metadata_gin
    ON products USING gin (metadata jsonb_path_ops);

-- Supports: metadata @> '{"color": "red"}'

GiST for Range Overlap

CREATE INDEX idx_reservations_during
    ON reservations USING gist (during);

-- Supports: WHERE during && '[2025-01-01, 2025-01-31]'::daterange

BRIN for Time-Series

-- Table has millions of rows inserted in timestamp order
CREATE INDEX idx_logs_ts_brin
    ON logs USING brin (created_at)
    WITH (pages_per_range = 32);

Sizing Rules of Thumb

Table Rows B-tree Size BRIN Size GIN Size
1M ~20 MB ~50 KB ~30 MB
10M ~200 MB ~500 KB ~300 MB
100M ~2 GB ~5 MB ~3 GB

Diagnostic Queries

-- Check if an index is being used
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- Find unused indexes
SELECT indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- Check index size
SELECT pg_size_pretty(pg_relation_size('idx_name'));

-- Index bloat estimate
SELECT * FROM pgstatindex('idx_name');

Anti-Patterns

Mistake Why It Hurts
Indexing every column Slows writes, wastes disk, confuses planner
Wrong column order in composite Index cannot be used for the query
GIN on tiny tables Overhead exceeds benefit
B-tree on low-cardinality columns Planner prefers seq scan anyway
Missing CONCURRENTLY on production Locks the table during index build
Forgetting ANALYZE after bulk load Planner uses stale statistics

Safe Index Creation

-- Non-blocking index creation (no table lock)
CREATE INDEX CONCURRENTLY idx_name ON table (column);

-- Always run ANALYZE after bulk operations
ANALYZE table;