Deep EVM #27: PostgreSQL Performance at Scale — Indexes, VACUUM, and Query Optimization
Engineering Team
The Performance Toolkit
PostgreSQL is remarkably fast out of the box, but at scale — tens of millions of rows, thousands of queries per second — you need to understand its internals to maintain performance. This article covers the three pillars of PostgreSQL performance: indexes, VACUUM, and query optimization.
Index Types and When to Use Them
B-Tree (Default)
The default index type. Supports equality and range queries. If you do not specify a type, you get B-Tree:
CREATE INDEX idx_transactions_block
ON transactions (block_number);
-- Supports:
-- WHERE block_number = 18000000
-- WHERE block_number > 18000000
-- WHERE block_number BETWEEN 18000000 AND 18100000
-- ORDER BY block_number
Partial Indexes
Index only rows matching a condition. Dramatically smaller than full indexes:
-- Only index pending transactions (0.1% of table)
CREATE INDEX idx_transactions_pending
ON transactions (created_at)
WHERE status = 'pending';
-- Index only recent high-value transfers
CREATE INDEX idx_high_value_recent
ON transactions (from_addr, to_addr)
WHERE value_wei > 1000000000000000000 -- > 1 ETH
AND block_number > 18000000;
A partial index on 0.1% of a 34M-row table is 170KB instead of 800MB. Queries that match the WHERE condition use the tiny index; queries that do not match fall back to other indexes or sequential scan.
Covering Indexes (INCLUDE)
Include non-key columns in the index to enable index-only scans:
-- Without INCLUDE: index lookup + heap fetch for each row
CREATE INDEX idx_tx_from ON transactions (from_addr);
-- Query: SELECT from_addr, value_wei FROM transactions WHERE from_addr = $1
-- Plan: Index Scan -> Heap Fetch (slow for many rows)
-- With INCLUDE: index-only scan (no heap fetch)
CREATE INDEX idx_tx_from_covering
ON transactions (from_addr)
INCLUDE (value_wei, block_number, tx_hash);
-- Plan: Index Only Scan (fast, no heap access)
The trade-off: covering indexes are larger because they store extra columns. Use them for frequently queried column combinations.
BRIN Indexes (Block Range INdex)
For time-series or monotonically increasing data, BRIN indexes are tiny and effective:
-- BRIN index: ~100KB for 34M rows (vs ~800MB B-Tree)
CREATE INDEX idx_transactions_block_brin
ON transactions USING BRIN (block_number)
WITH (pages_per_range = 128);
BRIN works by storing min/max values per range of physical pages. If your data is inserted in order (block_number increases monotonically), BRIN can skip entire page ranges efficiently. If data is randomly ordered, BRIN is useless.
-- Perfect for BRIN: data inserted in block_number order
SELECT * FROM transactions WHERE block_number BETWEEN 18000000 AND 18100000;
-- BRIN eliminates 95% of pages in ~0.1ms
-- Bad for BRIN: random access pattern
SELECT * FROM transactions WHERE from_addr = $1;
-- BRIN cannot help here
GIN Indexes (Generalized Inverted Index)
For JSONB columns and full-text search:
-- Index JSONB fields for containment queries
CREATE INDEX idx_events_data_gin
ON events USING GIN (data jsonb_path_ops);
-- Supports:
-- WHERE data @> '{"event": "Transfer"}'
-- WHERE data @> '{"args": {"from": "0x..."}}'
-- Full-text search
CREATE INDEX idx_articles_fts
ON articles USING GIN (to_tsvector('english', title || ' ' || content));
VACUUM: The Silent Performance Killer
PostgreSQL uses MVCC (Multi-Version Concurrency Control). When you UPDATE or DELETE a row, the old version is not removed — it is marked as dead. VACUUM reclaims space from dead tuples.
Why VACUUM Matters
Without VACUUM:
- Dead tuples accumulate, bloating the table
- Index scans must skip dead tuples (slower)
- Transaction ID wraparound can halt the entire database
Autovacuum Configuration
Default autovacuum settings are conservative. For high-write tables, tune aggressively:
-- Per-table autovacuum tuning
ALTER TABLE transactions SET (
autovacuum_vacuum_threshold = 10000,
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_analyze_threshold = 5000,
autovacuum_analyze_scale_factor = 0.005,
autovacuum_vacuum_cost_delay = 2
);
Explanation:
vacuum_threshold = 10000: Start VACUUM after 10,000 dead tuples (default: 50)vacuum_scale_factor = 0.01: Plus 1% of table size (default: 20%!)vacuum_cost_delay = 2: Reduce sleep between vacuum operations (default: 2ms)
For a 34M-row table, the default scale factor of 20% means autovacuum waits until 6.8 million dead tuples accumulate. That is catastrophic. Set it to 1% (340,000 dead tuples) or lower.
Monitoring Vacuum
-- Check dead tuple counts
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup,
ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 2) AS dead_pct,
last_vacuum,
last_autovacuum,
last_analyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;
Table Bloat
Even after VACUUM, space is not returned to the OS — it is only marked as reusable. To actually shrink the table:
-- VACUUM FULL: rewrites entire table (locks it exclusively)
VACUUM FULL transactions; -- DO NOT run in production
-- Better: pg_repack (no exclusive lock)
-- Install extension first
pg_repack --table transactions --no-kill-backend --no-superuser-check
pg_repack creates a new copy of the table, copies data, then swaps — all without an exclusive lock. Essential for production systems.
EXPLAIN ANALYZE: Reading Query Plans
The most important PostgreSQL debugging tool:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT t.*, a.balance_wei
FROM transactions t
JOIN addresses a ON t.from_addr = a.address
WHERE t.block_number BETWEEN 18000000 AND 18100000
ORDER BY t.value_wei DESC
LIMIT 100;
Reading the output:
Limit (actual time=12.3..12.5 rows=100 loops=1)
-> Sort (actual time=12.3..12.4 rows=100 loops=1)
Sort Key: t.value_wei DESC
Sort Method: top-N heapsort Memory: 42kB
-> Nested Loop (actual time=0.2..10.1 rows=50234 loops=1)
-> Index Scan using idx_transactions_block on transactions t
(actual time=0.1..3.2 rows=50234 loops=1)
Index Cond: block_number >= 18000000 AND block_number <= 18100000
Buffers: shared hit=1234
-> Index Scan using addresses_pkey on addresses a
(actual time=0.001..0.001 rows=1 loops=50234)
Index Cond: address = t.from_addr
Buffers: shared hit=150702
Planning Time: 0.5ms
Execution Time: 12.8ms
Key metrics to watch:
- actual time: Real execution time (start..end) in milliseconds
- rows: Actual rows processed (compare with estimate for accuracy)
- Buffers: shared hit: Pages read from cache (good)
- Buffers: shared read: Pages read from disk (slow)
- loops: How many times this node executed
Common Anti-Patterns
1. Sequential Scan on Large Table
-- Missing index: Seq Scan on 34M rows
SELECT * FROM transactions WHERE tx_hash = $1;
-- Fix: CREATE INDEX idx_transactions_hash ON transactions (tx_hash);
2. Nested Loop with No Index
-- N+1 query pattern: 50K index lookups
-- Fix: use covering index or batch the lookup
3. Excessive Sort Memory
-- Sort Method: external merge Disk: 1234MB
-- Fix: increase work_mem for this query
SET work_mem = '256MB';
-- Or add an index that provides pre-sorted output
4. Inaccurate Row Estimates
-- rows=50234 (estimated) vs rows=1 (actual)
-- Stale statistics. Fix:
ANALYZE transactions;
Query Optimization Patterns
Use EXISTS Instead of IN for Subqueries
-- Slow: materializes entire subquery
SELECT * FROM transactions
WHERE from_addr IN (
SELECT address FROM flagged_addresses
);
-- Fast: stops at first match
SELECT * FROM transactions t
WHERE EXISTS (
SELECT 1 FROM flagged_addresses f
WHERE f.address = t.from_addr
);
Use CTEs Wisely
In PostgreSQL 12+, CTEs are inlined by default (optimization fence removed). But if you need to materialize:
-- Force materialization (useful when CTE is used multiple times)
WITH expensive_cte AS MATERIALIZED (
SELECT from_addr, SUM(value_wei) AS total
FROM transactions
WHERE block_number > 18000000
GROUP BY from_addr
)
SELECT * FROM expensive_cte WHERE total > 1000000000000000000;
Batch Operations
-- Slow: 10,000 individual queries
FOR addr IN addresses LOOP
SELECT balance FROM balances WHERE address = addr;
END LOOP;
-- Fast: single query with ANY
SELECT address, balance FROM balances
WHERE address = ANY($1::bytea[]);
Configuration for Performance
Key postgresql.conf settings for high-performance workloads:
# Memory
shared_buffers = 8GB # 25% of RAM
effective_cache_size = 24GB # 75% of RAM
work_mem = 64MB # Per-sort/hash operation
maintenance_work_mem = 2GB # For VACUUM, CREATE INDEX
# WAL
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
# Planner
random_page_cost = 1.1 # SSD (default 4.0 is for HDD)
effective_io_concurrency = 200 # SSD
# Parallel Queries
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
Conclusion
PostgreSQL performance at scale rests on three pillars: the right indexes for your query patterns (partial, covering, BRIN for time-series), aggressive autovacuum tuning to prevent dead tuple accumulation, and systematic query optimization using EXPLAIN ANALYZE. Do not guess — measure. Every slow query has a story in its execution plan, and every bloated table has a VACUUM configuration that needs attention.