Aller au contenu principal
DevOpsMar 28, 2026

Deep EVM #27: PostgreSQL Performance at Scale — Indexes, VACUUM, and Query Optimization

OS
Open Soft Team

Engineering Team

The Performance Toolkit

PostgreSQL is remarkably fast out of the box, but at scale — tens of millions of rows, thousands of queries per second — you need to understand its internals to maintain performance. This article covers the three pillars of PostgreSQL performance: indexes, VACUUM, and query optimization.

Index Types and When to Use Them

B-Tree (Default)

The default index type. Supports equality and range queries. If you do not specify a type, you get B-Tree:

CREATE INDEX idx_transactions_block
    ON transactions (block_number);

-- Supports:
-- WHERE block_number = 18000000
-- WHERE block_number > 18000000
-- WHERE block_number BETWEEN 18000000 AND 18100000
-- ORDER BY block_number

Partial Indexes

Index only rows matching a condition. Dramatically smaller than full indexes:

-- Only index pending transactions (0.1% of table)
CREATE INDEX idx_transactions_pending
    ON transactions (created_at)
    WHERE status = 'pending';

-- Index only recent high-value transfers
CREATE INDEX idx_high_value_recent
    ON transactions (from_addr, to_addr)
    WHERE value_wei > 1000000000000000000  -- > 1 ETH
    AND block_number > 18000000;

A partial index on 0.1% of a 34M-row table is 170KB instead of 800MB. Queries that match the WHERE condition use the tiny index; queries that do not match fall back to other indexes or sequential scan.

Covering Indexes (INCLUDE)

Include non-key columns in the index to enable index-only scans:

-- Without INCLUDE: index lookup + heap fetch for each row
CREATE INDEX idx_tx_from ON transactions (from_addr);
-- Query: SELECT from_addr, value_wei FROM transactions WHERE from_addr = $1
-- Plan: Index Scan -> Heap Fetch (slow for many rows)

-- With INCLUDE: index-only scan (no heap fetch)
CREATE INDEX idx_tx_from_covering
    ON transactions (from_addr)
    INCLUDE (value_wei, block_number, tx_hash);
-- Plan: Index Only Scan (fast, no heap access)

The trade-off: covering indexes are larger because they store extra columns. Use them for frequently queried column combinations.

BRIN Indexes (Block Range INdex)

For time-series or monotonically increasing data, BRIN indexes are tiny and effective:

-- BRIN index: ~100KB for 34M rows (vs ~800MB B-Tree)
CREATE INDEX idx_transactions_block_brin
    ON transactions USING BRIN (block_number)
    WITH (pages_per_range = 128);

BRIN works by storing min/max values per range of physical pages. If your data is inserted in order (block_number increases monotonically), BRIN can skip entire page ranges efficiently. If data is randomly ordered, BRIN is useless.

-- Perfect for BRIN: data inserted in block_number order
SELECT * FROM transactions WHERE block_number BETWEEN 18000000 AND 18100000;
-- BRIN eliminates 95% of pages in ~0.1ms

-- Bad for BRIN: random access pattern
SELECT * FROM transactions WHERE from_addr = $1;
-- BRIN cannot help here

GIN Indexes (Generalized Inverted Index)

For JSONB columns and full-text search:

-- Index JSONB fields for containment queries
CREATE INDEX idx_events_data_gin
    ON events USING GIN (data jsonb_path_ops);

-- Supports:
-- WHERE data @> '{"event": "Transfer"}'
-- WHERE data @> '{"args": {"from": "0x..."}}'

-- Full-text search
CREATE INDEX idx_articles_fts
    ON articles USING GIN (to_tsvector('english', title || ' ' || content));

VACUUM: The Silent Performance Killer

PostgreSQL uses MVCC (Multi-Version Concurrency Control). When you UPDATE or DELETE a row, the old version is not removed — it is marked as dead. VACUUM reclaims space from dead tuples.

Why VACUUM Matters

Without VACUUM:

  • Dead tuples accumulate, bloating the table
  • Index scans must skip dead tuples (slower)
  • Transaction ID wraparound can halt the entire database

Autovacuum Configuration

Default autovacuum settings are conservative. For high-write tables, tune aggressively:

-- Per-table autovacuum tuning
ALTER TABLE transactions SET (
    autovacuum_vacuum_threshold = 10000,
    autovacuum_vacuum_scale_factor = 0.01,
    autovacuum_analyze_threshold = 5000,
    autovacuum_analyze_scale_factor = 0.005,
    autovacuum_vacuum_cost_delay = 2
);

Explanation:

  • vacuum_threshold = 10000: Start VACUUM after 10,000 dead tuples (default: 50)
  • vacuum_scale_factor = 0.01: Plus 1% of table size (default: 20%!)
  • vacuum_cost_delay = 2: Reduce sleep between vacuum operations (default: 2ms)

For a 34M-row table, the default scale factor of 20% means autovacuum waits until 6.8 million dead tuples accumulate. That is catastrophic. Set it to 1% (340,000 dead tuples) or lower.

Monitoring Vacuum

-- Check dead tuple counts
SELECT
    schemaname,
    relname,
    n_live_tup,
    n_dead_tup,
    ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 2) AS dead_pct,
    last_vacuum,
    last_autovacuum,
    last_analyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

Table Bloat

Even after VACUUM, space is not returned to the OS — it is only marked as reusable. To actually shrink the table:

-- VACUUM FULL: rewrites entire table (locks it exclusively)
VACUUM FULL transactions;  -- DO NOT run in production

-- Better: pg_repack (no exclusive lock)
-- Install extension first
pg_repack --table transactions --no-kill-backend --no-superuser-check

pg_repack creates a new copy of the table, copies data, then swaps — all without an exclusive lock. Essential for production systems.

EXPLAIN ANALYZE: Reading Query Plans

The most important PostgreSQL debugging tool:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT t.*, a.balance_wei
FROM transactions t
JOIN addresses a ON t.from_addr = a.address
WHERE t.block_number BETWEEN 18000000 AND 18100000
ORDER BY t.value_wei DESC
LIMIT 100;

Reading the output:

Limit (actual time=12.3..12.5 rows=100 loops=1)
  -> Sort (actual time=12.3..12.4 rows=100 loops=1)
        Sort Key: t.value_wei DESC
        Sort Method: top-N heapsort  Memory: 42kB
        -> Nested Loop (actual time=0.2..10.1 rows=50234 loops=1)
              -> Index Scan using idx_transactions_block on transactions t
                    (actual time=0.1..3.2 rows=50234 loops=1)
                    Index Cond: block_number >= 18000000 AND block_number <= 18100000
                    Buffers: shared hit=1234
              -> Index Scan using addresses_pkey on addresses a
                    (actual time=0.001..0.001 rows=1 loops=50234)
                    Index Cond: address = t.from_addr
                    Buffers: shared hit=150702
Planning Time: 0.5ms
Execution Time: 12.8ms

Key metrics to watch:

  • actual time: Real execution time (start..end) in milliseconds
  • rows: Actual rows processed (compare with estimate for accuracy)
  • Buffers: shared hit: Pages read from cache (good)
  • Buffers: shared read: Pages read from disk (slow)
  • loops: How many times this node executed

Common Anti-Patterns

1. Sequential Scan on Large Table

-- Missing index: Seq Scan on 34M rows
SELECT * FROM transactions WHERE tx_hash = $1;
-- Fix: CREATE INDEX idx_transactions_hash ON transactions (tx_hash);

2. Nested Loop with No Index

-- N+1 query pattern: 50K index lookups
-- Fix: use covering index or batch the lookup

3. Excessive Sort Memory

-- Sort Method: external merge  Disk: 1234MB
-- Fix: increase work_mem for this query
SET work_mem = '256MB';
-- Or add an index that provides pre-sorted output

4. Inaccurate Row Estimates

-- rows=50234 (estimated) vs rows=1 (actual)
-- Stale statistics. Fix:
ANALYZE transactions;

Query Optimization Patterns

Use EXISTS Instead of IN for Subqueries

-- Slow: materializes entire subquery
SELECT * FROM transactions
WHERE from_addr IN (
    SELECT address FROM flagged_addresses
);

-- Fast: stops at first match
SELECT * FROM transactions t
WHERE EXISTS (
    SELECT 1 FROM flagged_addresses f
    WHERE f.address = t.from_addr
);

Use CTEs Wisely

In PostgreSQL 12+, CTEs are inlined by default (optimization fence removed). But if you need to materialize:

-- Force materialization (useful when CTE is used multiple times)
WITH expensive_cte AS MATERIALIZED (
    SELECT from_addr, SUM(value_wei) AS total
    FROM transactions
    WHERE block_number > 18000000
    GROUP BY from_addr
)
SELECT * FROM expensive_cte WHERE total > 1000000000000000000;

Batch Operations

-- Slow: 10,000 individual queries
FOR addr IN addresses LOOP
    SELECT balance FROM balances WHERE address = addr;
END LOOP;

-- Fast: single query with ANY
SELECT address, balance FROM balances
WHERE address = ANY($1::bytea[]);

Configuration for Performance

Key postgresql.conf settings for high-performance workloads:

# Memory
shared_buffers = 8GB              # 25% of RAM
effective_cache_size = 24GB       # 75% of RAM
work_mem = 64MB                   # Per-sort/hash operation
maintenance_work_mem = 2GB        # For VACUUM, CREATE INDEX

# WAL
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB

# Planner
random_page_cost = 1.1            # SSD (default 4.0 is for HDD)
effective_io_concurrency = 200    # SSD

# Parallel Queries
max_parallel_workers_per_gather = 4
max_parallel_workers = 8

Conclusion

PostgreSQL performance at scale rests on three pillars: the right indexes for your query patterns (partial, covering, BRIN for time-series), aggressive autovacuum tuning to prevent dead tuple accumulation, and systematic query optimization using EXPLAIN ANALYZE. Do not guess — measure. Every slow query has a story in its execution plan, and every bloated table has a VACUUM configuration that needs attention.