[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-deep-evm-28-high-throughput-data-pipeline-batch-inserts":3},{"article":4,"author":59},{"id":5,"category_id":6,"title":7,"slug":8,"excerpt":9,"content_md":10,"content_html":11,"locale":12,"author_id":13,"published":14,"published_at":15,"meta_title":16,"meta_description":17,"focus_keyword":18,"og_image":19,"canonical_url":19,"robots_meta":20,"created_at":15,"updated_at":15,"tags":21,"category_name":24,"related_articles":39},"d0000000-0000-0000-0000-000000000128","a0000000-0000-0000-0000-000000000005","Deep EVM #28: High-Throughput Data Pipeline — Batch Inserts, COPY, and Conflict Resolution","deep-evm-28-high-throughput-data-pipeline-batch-inserts","Build high-throughput data pipelines with PostgreSQL using COPY protocol, bulk upsert patterns, WAL tuning, connection pooling with PgBouncer, and monitoring.","## The Insert Throughput Problem\n\nYou are building a blockchain indexer that processes 200 transactions per block, with a new block every 12 seconds. That is roughly 17 transactions per second at steady state. Easy, right? Until you factor in historical backfill: syncing 18 million blocks at maximum speed requires inserting 3.6 billion rows as fast as possible.\n\nIndividual INSERT statements max out at roughly 5,000 rows per second on a typical PostgreSQL server. COPY protocol can push 200,000+ rows per second. The difference is 40x — the difference between a 9-day backfill and a 5-hour backfill.\n\n## INSERT vs COPY Performance\n\n### Single INSERT (Slowest)\n\n```rust\n\u002F\u002F ~5,000 rows\u002Fsec — each statement is a round trip\nfor tx in transactions {\n    sqlx::query(\n        \"INSERT INTO transactions (hash, from_addr, to_addr, value_wei, block_number)\n         VALUES ($1, $2, $3, $4, $5)\"\n    )\n    .bind(&tx.hash)\n    .bind(&tx.from_addr)\n    .bind(&tx.to_addr)\n    .bind(&tx.value_wei)\n    .bind(tx.block_number)\n    .execute(&pool)\n    .await?;\n}\n```\n\n### Batch INSERT (Better)\n\n```rust\n\u002F\u002F ~50,000 rows\u002Fsec — single statement, multiple rows\nlet mut query_builder = sqlx::QueryBuilder::new(\n    \"INSERT INTO transactions (hash, from_addr, to_addr, value_wei, block_number) \"\n);\n\nquery_builder.push_values(&transactions[..], |mut b, tx| {\n    b.push_bind(&tx.hash)\n     .push_bind(&tx.from_addr)\n     .push_bind(&tx.to_addr)\n     .push_bind(&tx.value_wei)\n     .push_bind(tx.block_number);\n});\n\nquery_builder.build().execute(&pool).await?;\n```\n\nBatch size matters: too small and you waste round trips; too large and you hit PostgreSQL's parameter limit (65,535 parameters for a prepared statement).\n\n```rust\n\u002F\u002F Optimal batch size: 1000-5000 rows\nfor chunk in transactions.chunks(2000) {\n    let mut builder = sqlx::QueryBuilder::new(\n        \"INSERT INTO transactions (hash, from_addr, to_addr, value_wei, block_number) \"\n    );\n    builder.push_values(chunk, |mut b, tx| {\n        b.push_bind(&tx.hash)\n         .push_bind(&tx.from_addr)\n         .push_bind(&tx.to_addr)\n         .push_bind(&tx.value_wei)\n         .push_bind(tx.block_number);\n    });\n    builder.build().execute(&pool).await?;\n}\n```\n\n### COPY Protocol (Fastest)\n\nThe COPY protocol streams binary data directly into the table, bypassing the query parser and planner:\n\n```rust\nuse tokio_postgres::types::ToSql;\nuse futures::SinkExt;\n\n\u002F\u002F Using tokio-postgres directly for COPY support\nlet copy_query = \"COPY transactions (hash, from_addr, to_addr, value_wei, block_number)\n                  FROM STDIN WITH (FORMAT binary)\";\n\nlet sink = client.copy_in(copy_query).await?;\nlet writer = BinaryCopyInWriter::new(sink, &[\n    Type::BYTEA,    \u002F\u002F hash\n    Type::BYTEA,    \u002F\u002F from_addr\n    Type::BYTEA,    \u002F\u002F to_addr\n    Type::NUMERIC,  \u002F\u002F value_wei\n    Type::INT8,     \u002F\u002F block_number\n]);\n\npin_mut!(writer);\n\nfor tx in &transactions {\n    writer.as_mut().write(&[\n        &tx.hash as &(dyn ToSql + Sync),\n        &tx.from_addr,\n        &tx.to_addr,\n        &tx.value_wei,\n        &tx.block_number,\n    ]).await?;\n}\n\nwriter.finish().await?;\n```\n\nPerformance comparison on a typical server (NVMe SSD, 32GB RAM):\n\n| Method | Throughput | Latency\u002FRow | Network Roundtrips |\n|--------|-----------|-------------|--------------------|\n| Single INSERT | 5K rows\u002Fsec | 200us | 1 per row |\n| Batch INSERT (1000) | 50K rows\u002Fsec | 20us | 1 per 1000 rows |\n| COPY text | 150K rows\u002Fsec | 6.7us | Streaming |\n| COPY binary | 250K rows\u002Fsec | 4us | Streaming |\n\n## ON CONFLICT Strategies\n\nReal-world data pipelines encounter duplicates. PostgreSQL's `ON CONFLICT` clause handles this efficiently:\n\n### Upsert (Update on Conflict)\n\n```sql\nINSERT INTO transactions (hash, from_addr, to_addr, value_wei, block_number, status)\nVALUES ($1, $2, $3, $4, $5, $6)\nON CONFLICT (hash) DO UPDATE SET\n    status = EXCLUDED.status,\n    updated_at = NOW()\nWHERE transactions.status != EXCLUDED.status;\n```\n\nThe `WHERE` clause in `DO UPDATE` prevents unnecessary writes (and WAL generation) when the row has not actually changed.\n\n### Skip Duplicates\n\n```sql\nINSERT INTO transactions (hash, from_addr, to_addr, value_wei, block_number)\nVALUES ($1, $2, $3, $4, $5)\nON CONFLICT (hash) DO NOTHING;\n```\n\n### Bulk Upsert Pattern\n\n```rust\nasync fn bulk_upsert(\n    pool: &PgPool,\n    transactions: &[Transaction],\n) -> anyhow::Result\u003Cu64> {\n    let mut total_affected = 0u64;\n\n    for chunk in transactions.chunks(2000) {\n        let mut builder = sqlx::QueryBuilder::new(\n            \"INSERT INTO transactions (hash, from_addr, to_addr, value_wei, block_number) \"\n        );\n\n        builder.push_values(chunk, |mut b, tx| {\n            b.push_bind(&tx.hash)\n             .push_bind(&tx.from_addr)\n             .push_bind(&tx.to_addr)\n             .push_bind(&tx.value_wei)\n             .push_bind(tx.block_number);\n        });\n\n        builder.push(\" ON CONFLICT (hash) DO UPDATE SET value_wei = EXCLUDED.value_wei, block_number = EXCLUDED.block_number WHERE transactions.block_number \u003C EXCLUDED.block_number\");\n\n        let result = builder.build().execute(pool).await?;\n        total_affected += result.rows_affected();\n    }\n\n    Ok(total_affected)\n}\n```\n\n## WAL Tuning for High Write Throughput\n\nThe Write-Ahead Log (WAL) is PostgreSQL's durability mechanism. Every write goes to WAL before the table. High write throughput requires WAL tuning:\n\n```ini\n# postgresql.conf — high-write workload\n\n# WAL size — larger = fewer checkpoints\nmax_wal_size = 8GB\nmin_wal_size = 2GB\n\n# Checkpoint tuning\ncheckpoint_completion_target = 0.9  # Spread checkpoint I\u002FO\ncheckpoint_timeout = 15min          # Less frequent checkpoints\n\n# WAL compression (PostgreSQL 15+)\nwal_compression = zstd\n\n# Synchronous commit — trade durability for speed\n# Only disable if you can tolerate losing last few transactions on crash\nsynchronous_commit = off  # ~3x write throughput increase\n\n# WAL writer\nwal_writer_delay = 200ms\nwal_writer_flush_after = 1MB\n```\n\n**Warning:** Setting `synchronous_commit = off` means a crash can lose the last 200ms of committed transactions. For blockchain data that can be re-fetched, this is acceptable. For financial records, never disable it.\n\n### Unlogged Tables for Temporary Data\n\nFor staging tables during backfill:\n\n```sql\n-- No WAL = 5-10x faster writes, but data lost on crash\nCREATE UNLOGGED TABLE transactions_staging (\n    LIKE transactions INCLUDING ALL\n);\n\n-- Bulk load into staging\nCOPY transactions_staging FROM STDIN WITH (FORMAT binary);\n\n-- Move to permanent table\nINSERT INTO transactions\nSELECT * FROM transactions_staging\nON CONFLICT (hash) DO NOTHING;\n\nDROP TABLE transactions_staging;\n```\n\n## PgBouncer Connection Pooling\n\nPostgreSQL creates a new process for each connection. At 100+ connections, the process overhead degrades performance. PgBouncer sits between your application and PostgreSQL, multiplexing thousands of application connections onto a few PostgreSQL connections:\n\n```ini\n# pgbouncer.ini\n[databases]\nmydb = host=localhost port=5432 dbname=mydb\n\n[pgbouncer]\nlisten_port = 6432\nlisten_addr = 0.0.0.0\nauth_type = scram-sha-256\nauth_file = \u002Fetc\u002Fpgbouncer\u002Fuserlist.txt\n\n# Pool settings\npool_mode = transaction    # Release connection after each transaction\ndefault_pool_size = 25     # Connections per user\u002Fdatabase pair\nmax_client_conn = 1000     # Max incoming connections\nmax_db_connections = 50    # Max connections to PostgreSQL\n\n# Timeouts\nserver_idle_timeout = 600\nclient_idle_timeout = 0\nquery_timeout = 30\n```\n\nPool modes:\n- **session**: Connection held for entire client session (like no pooling)\n- **transaction**: Connection returned after each transaction (recommended)\n- **statement**: Connection returned after each statement (strictest, some features break)\n\nIn Rust with sqlx:\n\n```rust\n\u002F\u002F Connect through PgBouncer\nlet pool = PgPoolOptions::new()\n    .max_connections(50)  \u002F\u002F Match PgBouncer's default_pool_size\n    .min_connections(5)\n    .acquire_timeout(Duration::from_secs(3))\n    .connect(\"postgres:\u002F\u002Fuser:pass@localhost:6432\u002Fmydb\")\n    .await?;\n```\n\n## Monitoring: pg_stat_statements and Slow Query Log\n\n### pg_stat_statements\n\nThe most important PostgreSQL monitoring extension. It tracks execution statistics for all queries:\n\n```sql\n-- Enable in postgresql.conf:\n-- shared_preload_libraries = 'pg_stat_statements'\nCREATE EXTENSION IF NOT EXISTS pg_stat_statements;\n\n-- Top 10 queries by total time\nSELECT\n    LEFT(query, 100) AS query,\n    calls,\n    ROUND(total_exec_time::numeric, 2) AS total_ms,\n    ROUND(mean_exec_time::numeric, 2) AS mean_ms,\n    ROUND((stddev_exec_time)::numeric, 2) AS stddev_ms,\n    rows\nFROM pg_stat_statements\nORDER BY total_exec_time DESC\nLIMIT 10;\n\n-- Queries with high variance (inconsistent performance)\nSELECT\n    LEFT(query, 100),\n    calls,\n    ROUND(mean_exec_time::numeric, 2) AS mean_ms,\n    ROUND(stddev_exec_time::numeric, 2) AS stddev_ms,\n    ROUND((stddev_exec_time \u002F NULLIF(mean_exec_time, 0) * 100)::numeric, 1) AS cv_pct\nFROM pg_stat_statements\nWHERE calls > 100\nORDER BY stddev_exec_time \u002F NULLIF(mean_exec_time, 0) DESC\nLIMIT 10;\n```\n\n### Slow Query Log\n\n```ini\n# postgresql.conf\nlog_min_duration_statement = 100  # Log queries slower than 100ms\nlog_line_prefix = '%t [%p]: db=%d,user=%u '\nlog_statement = 'none'  # Don't log all statements\nauto_explain.log_min_duration = 500  # Auto-explain queries > 500ms\nauto_explain.log_analyze = on\nauto_explain.log_buffers = on\n```\n\n### Monitoring Dashboard Queries\n\n```sql\n-- Connection state\nSELECT state, COUNT(*)\nFROM pg_stat_activity\nGROUP BY state;\n\n-- Table sizes\nSELECT\n    relname,\n    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,\n    pg_size_pretty(pg_relation_size(relid)) AS table_size,\n    pg_size_pretty(pg_indexes_size(relid)) AS index_size,\n    n_live_tup AS live_rows,\n    n_dead_tup AS dead_rows\nFROM pg_stat_user_tables\nORDER BY pg_total_relation_size(relid) DESC;\n\n-- Index usage\nSELECT\n    indexrelname,\n    idx_scan AS times_used,\n    pg_size_pretty(pg_relation_size(indexrelid)) AS size\nFROM pg_stat_user_indexes\nORDER BY idx_scan ASC;\n```\n\n## Complete Pipeline Architecture\n\n```\n[Block Source] -> [Decoder] -> [Batch Buffer (2000 rows)]\n                                       |\n                          +------------+\n                          |\n              [COPY to staging table]\n                          |\n              [INSERT ... ON CONFLICT from staging]\n                          |\n              [NOTIFY indexer_channel]\n                          |\n              [Async Index\u002FMaterialized View Refresh]\n```\n\nRust implementation of the batch buffer:\n\n```rust\nstruct BatchBuffer\u003CT> {\n    items: Vec\u003CT>,\n    capacity: usize,\n    flush_interval: Duration,\n    last_flush: Instant,\n}\n\nimpl\u003CT> BatchBuffer\u003CT> {\n    fn new(capacity: usize, flush_interval: Duration) -> Self {\n        Self {\n            items: Vec::with_capacity(capacity),\n            capacity,\n            flush_interval,\n            last_flush: Instant::now(),\n        }\n    }\n\n    fn push(&mut self, item: T) -> Option\u003CVec\u003CT>> {\n        self.items.push(item);\n        if self.should_flush() {\n            Some(self.flush())\n        } else {\n            None\n        }\n    }\n\n    fn should_flush(&self) -> bool {\n        self.items.len() >= self.capacity\n            || self.last_flush.elapsed() >= self.flush_interval\n    }\n\n    fn flush(&mut self) -> Vec\u003CT> {\n        self.last_flush = Instant::now();\n        std::mem::take(&mut self.items)\n    }\n}\n```\n\n## Conclusion\n\nHigh-throughput PostgreSQL data pipelines require thinking in batches, not individual rows. Use COPY protocol for maximum ingestion speed (250K rows\u002Fsec), batch INSERTs for moderate throughput with ON CONFLICT support, and tune WAL settings to reduce checkpoint overhead. Front your database with PgBouncer to handle connection storms, and monitor everything with pg_stat_statements. The difference between a naive pipeline and an optimized one is 40x — the difference between days and hours for large data migrations.","\u003Ch2 id=\"the-insert-throughput-problem\">The Insert Throughput Problem\u003C\u002Fh2>\n\u003Cp>You are building a blockchain indexer that processes 200 transactions per block, with a new block every 12 seconds. That is roughly 17 transactions per second at steady state. Easy, right? Until you factor in historical backfill: syncing 18 million blocks at maximum speed requires inserting 3.6 billion rows as fast as possible.\u003C\u002Fp>\n\u003Cp>Individual INSERT statements max out at roughly 5,000 rows per second on a typical PostgreSQL server. COPY protocol can push 200,000+ rows per second. The difference is 40x — the difference between a 9-day backfill and a 5-hour backfill.\u003C\u002Fp>\n\u003Ch2 id=\"insert-vs-copy-performance\">INSERT vs COPY Performance\u003C\u002Fh2>\n\u003Ch3>Single INSERT (Slowest)\u003C\u002Fh3>\n\u003Cpre>\u003Ccode class=\"language-rust\">\u002F\u002F ~5,000 rows\u002Fsec — each statement is a round trip\nfor tx in transactions {\n    sqlx::query(\n        \"INSERT INTO transactions (hash, from_addr, to_addr, value_wei, block_number)\n         VALUES ($1, $2, $3, $4, $5)\"\n    )\n    .bind(&amp;tx.hash)\n    .bind(&amp;tx.from_addr)\n    .bind(&amp;tx.to_addr)\n    .bind(&amp;tx.value_wei)\n    .bind(tx.block_number)\n    .execute(&amp;pool)\n    .await?;\n}\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch3>Batch INSERT (Better)\u003C\u002Fh3>\n\u003Cpre>\u003Ccode class=\"language-rust\">\u002F\u002F ~50,000 rows\u002Fsec — single statement, multiple rows\nlet mut query_builder = sqlx::QueryBuilder::new(\n    \"INSERT INTO transactions (hash, from_addr, to_addr, value_wei, block_number) \"\n);\n\nquery_builder.push_values(&amp;transactions[..], |mut b, tx| {\n    b.push_bind(&amp;tx.hash)\n     .push_bind(&amp;tx.from_addr)\n     .push_bind(&amp;tx.to_addr)\n     .push_bind(&amp;tx.value_wei)\n     .push_bind(tx.block_number);\n});\n\nquery_builder.build().execute(&amp;pool).await?;\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>Batch size matters: too small and you waste round trips; too large and you hit PostgreSQL’s parameter limit (65,535 parameters for a prepared statement).\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-rust\">\u002F\u002F Optimal batch size: 1000-5000 rows\nfor chunk in transactions.chunks(2000) {\n    let mut builder = sqlx::QueryBuilder::new(\n        \"INSERT INTO transactions (hash, from_addr, to_addr, value_wei, block_number) \"\n    );\n    builder.push_values(chunk, |mut b, tx| {\n        b.push_bind(&amp;tx.hash)\n         .push_bind(&amp;tx.from_addr)\n         .push_bind(&amp;tx.to_addr)\n         .push_bind(&amp;tx.value_wei)\n         .push_bind(tx.block_number);\n    });\n    builder.build().execute(&amp;pool).await?;\n}\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch3>COPY Protocol (Fastest)\u003C\u002Fh3>\n\u003Cp>The COPY protocol streams binary data directly into the table, bypassing the query parser and planner:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-rust\">use tokio_postgres::types::ToSql;\nuse futures::SinkExt;\n\n\u002F\u002F Using tokio-postgres directly for COPY support\nlet copy_query = \"COPY transactions (hash, from_addr, to_addr, value_wei, block_number)\n                  FROM STDIN WITH (FORMAT binary)\";\n\nlet sink = client.copy_in(copy_query).await?;\nlet writer = BinaryCopyInWriter::new(sink, &amp;[\n    Type::BYTEA,    \u002F\u002F hash\n    Type::BYTEA,    \u002F\u002F from_addr\n    Type::BYTEA,    \u002F\u002F to_addr\n    Type::NUMERIC,  \u002F\u002F value_wei\n    Type::INT8,     \u002F\u002F block_number\n]);\n\npin_mut!(writer);\n\nfor tx in &amp;transactions {\n    writer.as_mut().write(&amp;[\n        &amp;tx.hash as &amp;(dyn ToSql + Sync),\n        &amp;tx.from_addr,\n        &amp;tx.to_addr,\n        &amp;tx.value_wei,\n        &amp;tx.block_number,\n    ]).await?;\n}\n\nwriter.finish().await?;\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>Performance comparison on a typical server (NVMe SSD, 32GB RAM):\u003C\u002Fp>\n\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Method\u003C\u002Fth>\u003Cth>Throughput\u003C\u002Fth>\u003Cth>Latency\u002FRow\u003C\u002Fth>\u003Cth>Network Roundtrips\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\n\u003Ctr>\u003Ctd>Single INSERT\u003C\u002Ftd>\u003Ctd>5K rows\u002Fsec\u003C\u002Ftd>\u003Ctd>200us\u003C\u002Ftd>\u003Ctd>1 per row\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Batch INSERT (1000)\u003C\u002Ftd>\u003Ctd>50K rows\u002Fsec\u003C\u002Ftd>\u003Ctd>20us\u003C\u002Ftd>\u003Ctd>1 per 1000 rows\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>COPY text\u003C\u002Ftd>\u003Ctd>150K rows\u002Fsec\u003C\u002Ftd>\u003Ctd>6.7us\u003C\u002Ftd>\u003Ctd>Streaming\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>COPY binary\u003C\u002Ftd>\u003Ctd>250K rows\u002Fsec\u003C\u002Ftd>\u003Ctd>4us\u003C\u002Ftd>\u003Ctd>Streaming\u003C\u002Ftd>\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Ch2 id=\"on-conflict-strategies\">ON CONFLICT Strategies\u003C\u002Fh2>\n\u003Cp>Real-world data pipelines encounter duplicates. PostgreSQL’s \u003Ccode>ON CONFLICT\u003C\u002Fcode> clause handles this efficiently:\u003C\u002Fp>\n\u003Ch3>Upsert (Update on Conflict)\u003C\u002Fh3>\n\u003Cpre>\u003Ccode class=\"language-sql\">INSERT INTO transactions (hash, from_addr, to_addr, value_wei, block_number, status)\nVALUES ($1, $2, $3, $4, $5, $6)\nON CONFLICT (hash) DO UPDATE SET\n    status = EXCLUDED.status,\n    updated_at = NOW()\nWHERE transactions.status != EXCLUDED.status;\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>The \u003Ccode>WHERE\u003C\u002Fcode> clause in \u003Ccode>DO UPDATE\u003C\u002Fcode> prevents unnecessary writes (and WAL generation) when the row has not actually changed.\u003C\u002Fp>\n\u003Ch3>Skip Duplicates\u003C\u002Fh3>\n\u003Cpre>\u003Ccode class=\"language-sql\">INSERT INTO transactions (hash, from_addr, to_addr, value_wei, block_number)\nVALUES ($1, $2, $3, $4, $5)\nON CONFLICT (hash) DO NOTHING;\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch3>Bulk Upsert Pattern\u003C\u002Fh3>\n\u003Cpre>\u003Ccode class=\"language-rust\">async fn bulk_upsert(\n    pool: &amp;PgPool,\n    transactions: &amp;[Transaction],\n) -&gt; anyhow::Result&lt;u64&gt; {\n    let mut total_affected = 0u64;\n\n    for chunk in transactions.chunks(2000) {\n        let mut builder = sqlx::QueryBuilder::new(\n            \"INSERT INTO transactions (hash, from_addr, to_addr, value_wei, block_number) \"\n        );\n\n        builder.push_values(chunk, |mut b, tx| {\n            b.push_bind(&amp;tx.hash)\n             .push_bind(&amp;tx.from_addr)\n             .push_bind(&amp;tx.to_addr)\n             .push_bind(&amp;tx.value_wei)\n             .push_bind(tx.block_number);\n        });\n\n        builder.push(\" ON CONFLICT (hash) DO UPDATE SET value_wei = EXCLUDED.value_wei, block_number = EXCLUDED.block_number WHERE transactions.block_number &lt; EXCLUDED.block_number\");\n\n        let result = builder.build().execute(pool).await?;\n        total_affected += result.rows_affected();\n    }\n\n    Ok(total_affected)\n}\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch2 id=\"wal-tuning-for-high-write-throughput\">WAL Tuning for High Write Throughput\u003C\u002Fh2>\n\u003Cp>The Write-Ahead Log (WAL) is PostgreSQL’s durability mechanism. Every write goes to WAL before the table. High write throughput requires WAL tuning:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-ini\"># postgresql.conf — high-write workload\n\n# WAL size — larger = fewer checkpoints\nmax_wal_size = 8GB\nmin_wal_size = 2GB\n\n# Checkpoint tuning\ncheckpoint_completion_target = 0.9  # Spread checkpoint I\u002FO\ncheckpoint_timeout = 15min          # Less frequent checkpoints\n\n# WAL compression (PostgreSQL 15+)\nwal_compression = zstd\n\n# Synchronous commit — trade durability for speed\n# Only disable if you can tolerate losing last few transactions on crash\nsynchronous_commit = off  # ~3x write throughput increase\n\n# WAL writer\nwal_writer_delay = 200ms\nwal_writer_flush_after = 1MB\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>\u003Cstrong>Warning:\u003C\u002Fstrong> Setting \u003Ccode>synchronous_commit = off\u003C\u002Fcode> means a crash can lose the last 200ms of committed transactions. For blockchain data that can be re-fetched, this is acceptable. For financial records, never disable it.\u003C\u002Fp>\n\u003Ch3>Unlogged Tables for Temporary Data\u003C\u002Fh3>\n\u003Cp>For staging tables during backfill:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-sql\">-- No WAL = 5-10x faster writes, but data lost on crash\nCREATE UNLOGGED TABLE transactions_staging (\n    LIKE transactions INCLUDING ALL\n);\n\n-- Bulk load into staging\nCOPY transactions_staging FROM STDIN WITH (FORMAT binary);\n\n-- Move to permanent table\nINSERT INTO transactions\nSELECT * FROM transactions_staging\nON CONFLICT (hash) DO NOTHING;\n\nDROP TABLE transactions_staging;\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch2 id=\"pgbouncer-connection-pooling\">PgBouncer Connection Pooling\u003C\u002Fh2>\n\u003Cp>PostgreSQL creates a new process for each connection. At 100+ connections, the process overhead degrades performance. PgBouncer sits between your application and PostgreSQL, multiplexing thousands of application connections onto a few PostgreSQL connections:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-ini\"># pgbouncer.ini\n[databases]\nmydb = host=localhost port=5432 dbname=mydb\n\n[pgbouncer]\nlisten_port = 6432\nlisten_addr = 0.0.0.0\nauth_type = scram-sha-256\nauth_file = \u002Fetc\u002Fpgbouncer\u002Fuserlist.txt\n\n# Pool settings\npool_mode = transaction    # Release connection after each transaction\ndefault_pool_size = 25     # Connections per user\u002Fdatabase pair\nmax_client_conn = 1000     # Max incoming connections\nmax_db_connections = 50    # Max connections to PostgreSQL\n\n# Timeouts\nserver_idle_timeout = 600\nclient_idle_timeout = 0\nquery_timeout = 30\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>Pool modes:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>\u003Cstrong>session\u003C\u002Fstrong>: Connection held for entire client session (like no pooling)\u003C\u002Fli>\n\u003Cli>\u003Cstrong>transaction\u003C\u002Fstrong>: Connection returned after each transaction (recommended)\u003C\u002Fli>\n\u003Cli>\u003Cstrong>statement\u003C\u002Fstrong>: Connection returned after each statement (strictest, some features break)\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>In Rust with sqlx:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-rust\">\u002F\u002F Connect through PgBouncer\nlet pool = PgPoolOptions::new()\n    .max_connections(50)  \u002F\u002F Match PgBouncer's default_pool_size\n    .min_connections(5)\n    .acquire_timeout(Duration::from_secs(3))\n    .connect(\"postgres:\u002F\u002Fuser:pass@localhost:6432\u002Fmydb\")\n    .await?;\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch2 id=\"monitoring-pg-stat-statements-and-slow-query-log\">Monitoring: pg_stat_statements and Slow Query Log\u003C\u002Fh2>\n\u003Ch3>pg_stat_statements\u003C\u002Fh3>\n\u003Cp>The most important PostgreSQL monitoring extension. It tracks execution statistics for all queries:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-sql\">-- Enable in postgresql.conf:\n-- shared_preload_libraries = 'pg_stat_statements'\nCREATE EXTENSION IF NOT EXISTS pg_stat_statements;\n\n-- Top 10 queries by total time\nSELECT\n    LEFT(query, 100) AS query,\n    calls,\n    ROUND(total_exec_time::numeric, 2) AS total_ms,\n    ROUND(mean_exec_time::numeric, 2) AS mean_ms,\n    ROUND((stddev_exec_time)::numeric, 2) AS stddev_ms,\n    rows\nFROM pg_stat_statements\nORDER BY total_exec_time DESC\nLIMIT 10;\n\n-- Queries with high variance (inconsistent performance)\nSELECT\n    LEFT(query, 100),\n    calls,\n    ROUND(mean_exec_time::numeric, 2) AS mean_ms,\n    ROUND(stddev_exec_time::numeric, 2) AS stddev_ms,\n    ROUND((stddev_exec_time \u002F NULLIF(mean_exec_time, 0) * 100)::numeric, 1) AS cv_pct\nFROM pg_stat_statements\nWHERE calls &gt; 100\nORDER BY stddev_exec_time \u002F NULLIF(mean_exec_time, 0) DESC\nLIMIT 10;\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch3>Slow Query Log\u003C\u002Fh3>\n\u003Cpre>\u003Ccode class=\"language-ini\"># postgresql.conf\nlog_min_duration_statement = 100  # Log queries slower than 100ms\nlog_line_prefix = '%t [%p]: db=%d,user=%u '\nlog_statement = 'none'  # Don't log all statements\nauto_explain.log_min_duration = 500  # Auto-explain queries &gt; 500ms\nauto_explain.log_analyze = on\nauto_explain.log_buffers = on\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch3>Monitoring Dashboard Queries\u003C\u002Fh3>\n\u003Cpre>\u003Ccode class=\"language-sql\">-- Connection state\nSELECT state, COUNT(*)\nFROM pg_stat_activity\nGROUP BY state;\n\n-- Table sizes\nSELECT\n    relname,\n    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,\n    pg_size_pretty(pg_relation_size(relid)) AS table_size,\n    pg_size_pretty(pg_indexes_size(relid)) AS index_size,\n    n_live_tup AS live_rows,\n    n_dead_tup AS dead_rows\nFROM pg_stat_user_tables\nORDER BY pg_total_relation_size(relid) DESC;\n\n-- Index usage\nSELECT\n    indexrelname,\n    idx_scan AS times_used,\n    pg_size_pretty(pg_relation_size(indexrelid)) AS size\nFROM pg_stat_user_indexes\nORDER BY idx_scan ASC;\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch2 id=\"complete-pipeline-architecture\">Complete Pipeline Architecture\u003C\u002Fh2>\n\u003Cpre>\u003Ccode>[Block Source] -&gt; [Decoder] -&gt; [Batch Buffer (2000 rows)]\n                                       |\n                          +------------+\n                          |\n              [COPY to staging table]\n                          |\n              [INSERT ... ON CONFLICT from staging]\n                          |\n              [NOTIFY indexer_channel]\n                          |\n              [Async Index\u002FMaterialized View Refresh]\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>Rust implementation of the batch buffer:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-rust\">struct BatchBuffer&lt;T&gt; {\n    items: Vec&lt;T&gt;,\n    capacity: usize,\n    flush_interval: Duration,\n    last_flush: Instant,\n}\n\nimpl&lt;T&gt; BatchBuffer&lt;T&gt; {\n    fn new(capacity: usize, flush_interval: Duration) -&gt; Self {\n        Self {\n            items: Vec::with_capacity(capacity),\n            capacity,\n            flush_interval,\n            last_flush: Instant::now(),\n        }\n    }\n\n    fn push(&amp;mut self, item: T) -&gt; Option&lt;Vec&lt;T&gt;&gt; {\n        self.items.push(item);\n        if self.should_flush() {\n            Some(self.flush())\n        } else {\n            None\n        }\n    }\n\n    fn should_flush(&amp;self) -&gt; bool {\n        self.items.len() &gt;= self.capacity\n            || self.last_flush.elapsed() &gt;= self.flush_interval\n    }\n\n    fn flush(&amp;mut self) -&gt; Vec&lt;T&gt; {\n        self.last_flush = Instant::now();\n        std::mem::take(&amp;mut self.items)\n    }\n}\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch2 id=\"conclusion\">Conclusion\u003C\u002Fh2>\n\u003Cp>High-throughput PostgreSQL data pipelines require thinking in batches, not individual rows. Use COPY protocol for maximum ingestion speed (250K rows\u002Fsec), batch INSERTs for moderate throughput with ON CONFLICT support, and tune WAL settings to reduce checkpoint overhead. Front your database with PgBouncer to handle connection storms, and monitor everything with pg_stat_statements. The difference between a naive pipeline and an optimized one is 40x — the difference between days and hours for large data migrations.\u003C\u002Fp>\n","en","b0000000-0000-0000-0000-000000000001",true,"2026-03-28T10:44:23.193040Z","High-Throughput Data Pipeline — Batch Inserts, COPY, and Conflict Resolution","Build high-throughput PostgreSQL data pipelines with COPY protocol, bulk upserts, WAL tuning, PgBouncer connection pooling, and pg_stat_statements monitoring.","postgresql high throughput data pipeline",null,"index, follow",[22,27,31,35],{"id":23,"name":24,"slug":25,"created_at":26},"c0000000-0000-0000-0000-000000000012","DevOps","devops","2026-03-28T10:44:21.513630Z",{"id":28,"name":29,"slug":30,"created_at":26},"c0000000-0000-0000-0000-000000000022","Performance","performance",{"id":32,"name":33,"slug":34,"created_at":26},"c0000000-0000-0000-0000-000000000005","PostgreSQL","postgresql",{"id":36,"name":37,"slug":38,"created_at":26},"c0000000-0000-0000-0000-000000000001","Rust","rust",[40,47,53],{"id":41,"title":42,"slug":43,"excerpt":44,"locale":12,"category_name":45,"published_at":46},"d0200000-0000-0000-0000-000000000003","Why Bali Is Becoming Southeast Asia's Impact-Tech Hub in 2026","why-bali-becoming-southeast-asia-impact-tech-hub-2026","Bali ranks #16 among Southeast Asian startup ecosystems. With a growing concentration of Web3 builders, AI sustainability startups, and eco-travel tech companies, the island is carving a niche as the region's impact-tech capital.","Engineering","2026-03-28T10:44:37.748283Z",{"id":48,"title":49,"slug":50,"excerpt":51,"locale":12,"category_name":45,"published_at":52},"d0200000-0000-0000-0000-000000000002","ASEAN Data Protection Patchwork: A Developer's Compliance Checklist","asean-data-protection-patchwork-developer-compliance-checklist","Seven ASEAN countries now have comprehensive data protection laws, each with different consent models, localization requirements, and penalty structures. Here is a practical compliance checklist for developers building multi-country applications.","2026-03-28T10:44:37.374741Z",{"id":54,"title":55,"slug":56,"excerpt":57,"locale":12,"category_name":45,"published_at":58},"d0200000-0000-0000-0000-000000000001","Indonesia's $29 Billion Digital Transformation: Opportunities for Software Companies","indonesia-29-billion-digital-transformation-opportunities-software-companies","Indonesia's IT services market is projected to reach $29.03 billion in 2026, up from $24.37 billion in 2025. Cloud infrastructure, AI, e-commerce, and data centers are driving the fastest growth in Southeast Asia.","2026-03-28T10:44:37.349311Z",{"id":13,"name":60,"slug":61,"bio":62,"photo_url":19,"linkedin":19,"role":63,"created_at":64,"updated_at":64},"Open Soft Team","open-soft-team","The engineering team at Open Soft, building premium software solutions from Bali, Indonesia.","Engineering Team","2026-03-28T08:31:22.226811Z"]