Zum Hauptinhalt springen
DevOpsMar 28, 2026

Deep EVM #26: Sharding vs Partitioning — Architecture for Massive Tables

OS
Open Soft Team

Engineering Team

Partitioning vs Sharding: The Fundamental Difference

Both partitioning and sharding split data into smaller pieces, but they operate at different levels:

  • Partitioning splits a table into multiple physical tables on the SAME database server. The database manages routing transparently.
  • Sharding splits data across MULTIPLE database servers. The application manages routing.
Partitioning (single server):
+-----------------------------------+
| PostgreSQL Server                 |
| +----------+ +----------+        |
| | Part. 1  | | Part. 2  | ...    |
| +----------+ +----------+        |
+-----------------------------------+

Sharding (multiple servers):
+-------------+  +-------------+  +-------------+
| PG Server 1 |  | PG Server 2 |  | PG Server 3 |
| (Shard A)   |  | (Shard B)   |  | (Shard C)   |
+-------------+  +-------------+  +-------------+

Partitioning scales storage and query performance on a single machine. Sharding scales beyond the limits of a single machine — CPU, memory, disk IOPS, and network bandwidth.

When to Partition vs When to Shard

CriterionPartitionShard
Data size < 1TBYesOverkill
Data size > 1TBMaybeYes
Write throughput < 10K/secYesOverkill
Write throughput > 50K/secNot enoughYes
Read pattern is locality-basedYesYes
Need cross-data joinsEasyDifficult
Operational complexity budgetLowHigh
Single-machine CPU is bottleneckNo helpYes

The rule: start with partitioning, graduate to sharding only when a single server is insufficient.

Application-Level Sharding

The most common sharding approach puts routing logic in the application layer:

use sqlx::PgPool;
use std::collections::HashMap;

struct ShardRouter {
    shards: HashMap<u32, PgPool>,
    shard_count: u32,
}

impl ShardRouter {
    async fn new(shard_urls: Vec<String>) -> anyhow::Result<Self> {
        let mut shards = HashMap::new();
        for (i, url) in shard_urls.iter().enumerate() {
            let pool = PgPool::connect(url).await?;
            shards.insert(i as u32, pool);
        }
        Ok(Self {
            shard_count: shards.len() as u32,
            shards,
        })
    }

    fn get_shard(&self, shard_key: &[u8]) -> &PgPool {
        let hash = xxhash_rust::xxh3::xxh3_64(shard_key);
        let shard_id = (hash % self.shard_count as u64) as u32;
        &self.shards[&shard_id]
    }

    async fn query_single_shard<T: sqlx::FromRow<'static, sqlx::postgres::PgRow>>(
        &self,
        shard_key: &[u8],
        query: &str,
        params: &[&(dyn sqlx::Encode<'_, sqlx::Postgres> + Sync)],
    ) -> anyhow::Result<Option<T>> {
        let pool = self.get_shard(shard_key);
        let row = sqlx::query_as::<_, T>(query)
            .fetch_optional(pool)
            .await?;
        Ok(row)
    }
}

Usage

let router = ShardRouter::new(vec![
    "postgres://shard1:5432/db".into(),
    "postgres://shard2:5432/db".into(),
    "postgres://shard3:5432/db".into(),
]).await?;

// Route by user address
let user_address = b"0x742d35Cc6634C0532925a3b844Bc9e7595f2bD38";
let pool = router.get_shard(user_address);
let txs = sqlx::query_as::<_, Transaction>(
    "SELECT * FROM transactions WHERE from_addr = $1 ORDER BY block_number DESC LIMIT 100"
)
.bind(user_address)
.fetch_all(pool)
.await?;

Consistent Hashing

Simple modulo hashing (hash % shard_count) breaks when you add or remove shards — almost all keys remap. Consistent hashing minimizes key redistribution:

use std::collections::BTreeMap;

struct ConsistentHashRing {
    ring: BTreeMap<u64, u32>,  // hash -> shard_id
    virtual_nodes: u32,
    shards: HashMap<u32, PgPool>,
}

impl ConsistentHashRing {
    fn new(virtual_nodes: u32) -> Self {
        Self {
            ring: BTreeMap::new(),
            virtual_nodes,
            shards: HashMap::new(),
        }
    }

    fn add_shard(&mut self, shard_id: u32, pool: PgPool) {
        for i in 0..self.virtual_nodes {
            let key = format!("shard-{}-vnode-{}", shard_id, i);
            let hash = xxhash_rust::xxh3::xxh3_64(key.as_bytes());
            self.ring.insert(hash, shard_id);
        }
        self.shards.insert(shard_id, pool);
    }

    fn remove_shard(&mut self, shard_id: u32) {
        for i in 0..self.virtual_nodes {
            let key = format!("shard-{}-vnode-{}", shard_id, i);
            let hash = xxhash_rust::xxh3::xxh3_64(key.as_bytes());
            self.ring.remove(&hash);
        }
        self.shards.remove(&shard_id);
    }

    fn get_shard(&self, key: &[u8]) -> &PgPool {
        let hash = xxhash_rust::xxh3::xxh3_64(key);

        // Find the first node clockwise from the hash
        let shard_id = self.ring
            .range(hash..)
            .next()
            .or_else(|| self.ring.iter().next())  // Wrap around
            .map(|(_, &id)| id)
            .expect("Ring is empty");

        &self.shards[&shard_id]
    }
}

With consistent hashing and 256 virtual nodes per shard, adding a new shard only remaps approximately 1/N of the keys (where N is the total number of shards), instead of nearly all keys with modulo hashing.

Cross-Shard Queries

The hardest problem in sharding is queries that span multiple shards:

impl ShardRouter {
    /// Query all shards in parallel and merge results
    async fn query_all_shards<T>(
        &self,
        query: &str,
    ) -> anyhow::Result<Vec<T>>
    where
        T: sqlx::FromRow<'static, sqlx::postgres::PgRow> + Send + 'static,
    {
        let mut handles = Vec::new();

        for pool in self.shards.values() {
            let pool = pool.clone();
            let query = query.to_string();

            let handle = tokio::spawn(async move {
                sqlx::query_as::<_, T>(&query)
                    .fetch_all(&pool)
                    .await
            });
            handles.push(handle);
        }

        let mut results = Vec::new();
        for handle in handles {
            let shard_results = handle.await??;
            results.extend(shard_results);
        }

        Ok(results)
    }

    /// Cross-shard aggregation
    async fn count_all_shards(
        &self,
        query: &str,
    ) -> anyhow::Result<i64> {
        let mut handles = Vec::new();

        for pool in self.shards.values() {
            let pool = pool.clone();
            let query = query.to_string();

            let handle = tokio::spawn(async move {
                let row: (i64,) = sqlx::query_as(&query)
                    .fetch_one(&pool)
                    .await?;
                Ok::<i64, anyhow::Error>(row.0)
            });
            handles.push(handle);
        }

        let mut total: i64 = 0;
        for handle in handles {
            total += handle.await??;
        }

        Ok(total)
    }
}

Cross-shard queries are expensive: they hit every shard, transfer data over the network, and merge results in memory. Design your shard key to minimize cross-shard queries.

Choosing a Shard Key

Shard KeyProsCons
User addressUser queries are single-shardCross-user analytics hit all shards
Block numberBlock range queries are single-shardUser history spans all shards
Chain IDChain-specific queries are fastMulti-chain aggregations are slow
Hash of TXEven distributionEvery query hits random shard

The best shard key matches your most common access pattern. For a blockchain explorer, sharding by address makes sense: most queries are “show me transactions for this address.”

Resharding Strategies

When you need to add shards to an existing cluster:

Double-Write Migration

async fn resharding_migration(
    old_router: &ShardRouter,
    new_router: &ShardRouter,
) -> anyhow::Result<()> {
    // Phase 1: Start double-writing to both old and new shards
    tracing::info!("Phase 1: Double-write enabled");

    // Phase 2: Backfill old data to new shards
    for (shard_id, pool) in &old_router.shards {
        let mut cursor: i64 = 0;
        loop {
            let batch: Vec<Transaction> = sqlx::query_as(
                "SELECT * FROM transactions WHERE id > $1 ORDER BY id LIMIT 10000"
            )
            .bind(cursor)
            .fetch_all(pool)
            .await?;

            if batch.is_empty() { break; }
            cursor = batch.last().unwrap().id;

            for tx in &batch {
                let new_pool = new_router.get_shard(&tx.from_addr);
                sqlx::query(
                    "INSERT INTO transactions (id, from_addr, to_addr, value_wei, block_number)
                     VALUES ($1, $2, $3, $4, $5)
                     ON CONFLICT (id) DO NOTHING"
                )
                .bind(tx.id)
                .bind(&tx.from_addr)
                .bind(&tx.to_addr)
                .bind(&tx.value_wei)
                .bind(tx.block_number)
                .execute(new_pool)
                .await?;
            }

            tracing::info!(shard = shard_id, cursor, "Backfill progress");
        }
    }

    // Phase 3: Switch reads to new shards
    tracing::info!("Phase 3: Reads switched to new cluster");

    // Phase 4: Stop writes to old shards
    tracing::info!("Phase 4: Old cluster decommissioned");

    Ok(())
}

PostgreSQL Native Partitioning vs Application Sharding

FeaturePG PartitioningApplication Sharding
RoutingTransparent (PG handles it)Application code
Cross-partition joinsNative SQLManual merge
TransactionsFull ACIDDistributed (2PC or saga)
Max data sizeSingle server limitsUnlimited
Operational complexityLowHigh
Foreign keysLimitedNone
Scaling dimensionStorage, query performanceEverything

Hybrid Approach: Partition Within Shards

The most scalable architecture combines both:

Application Sharding (by address prefix)
+------------------+  +------------------+  +------------------+
| Shard 0 (0x0-5)  |  | Shard 1 (0x6-a)  |  | Shard 2 (0xb-f)  |
| +---------+      |  | +---------+      |  | +---------+      |
| | Part Q1 |      |  | | Part Q1 |      |  | | Part Q1 |      |
| | Part Q2 |      |  | | Part Q2 |      |  | | Part Q2 |      |
| | Part Q3 |      |  | | Part Q3 |      |  | | Part Q3 |      |
| | Part Q4 |      |  | | Part Q4 |      |  | | Part Q4 |      |
| +---------+      |  | +---------+      |  | +---------+      |
+------------------+  +------------------+  +------------------+

Sharding handles horizontal scaling across machines. Within each shard, partitioning handles query performance and maintenance efficiency.

Conclusion

Start with PostgreSQL native partitioning — it is transparent, supports joins, and has minimal operational overhead. Graduate to application-level sharding only when a single server cannot handle your write throughput, data volume, or query load. Use consistent hashing to minimize data redistribution when adding shards, design your shard key to match your primary access pattern, and expect cross-shard queries to be expensive. For maximum scale, combine sharding across servers with partitioning within each shard.