Deep EVM #26: Sharding vs Partitioning — Architecture for Massive Tables
Engineering Team
Partitioning vs Sharding: The Fundamental Difference
Both partitioning and sharding split data into smaller pieces, but they operate at different levels:
- Partitioning splits a table into multiple physical tables on the SAME database server. The database manages routing transparently.
- Sharding splits data across MULTIPLE database servers. The application manages routing.
Partitioning (single server):
+-----------------------------------+
| PostgreSQL Server |
| +----------+ +----------+ |
| | Part. 1 | | Part. 2 | ... |
| +----------+ +----------+ |
+-----------------------------------+
Sharding (multiple servers):
+-------------+ +-------------+ +-------------+
| PG Server 1 | | PG Server 2 | | PG Server 3 |
| (Shard A) | | (Shard B) | | (Shard C) |
+-------------+ +-------------+ +-------------+
Partitioning scales storage and query performance on a single machine. Sharding scales beyond the limits of a single machine — CPU, memory, disk IOPS, and network bandwidth.
When to Partition vs When to Shard
| Criterion | Partition | Shard |
|---|---|---|
| Data size < 1TB | Yes | Overkill |
| Data size > 1TB | Maybe | Yes |
| Write throughput < 10K/sec | Yes | Overkill |
| Write throughput > 50K/sec | Not enough | Yes |
| Read pattern is locality-based | Yes | Yes |
| Need cross-data joins | Easy | Difficult |
| Operational complexity budget | Low | High |
| Single-machine CPU is bottleneck | No help | Yes |
The rule: start with partitioning, graduate to sharding only when a single server is insufficient.
Application-Level Sharding
The most common sharding approach puts routing logic in the application layer:
use sqlx::PgPool;
use std::collections::HashMap;
struct ShardRouter {
shards: HashMap<u32, PgPool>,
shard_count: u32,
}
impl ShardRouter {
async fn new(shard_urls: Vec<String>) -> anyhow::Result<Self> {
let mut shards = HashMap::new();
for (i, url) in shard_urls.iter().enumerate() {
let pool = PgPool::connect(url).await?;
shards.insert(i as u32, pool);
}
Ok(Self {
shard_count: shards.len() as u32,
shards,
})
}
fn get_shard(&self, shard_key: &[u8]) -> &PgPool {
let hash = xxhash_rust::xxh3::xxh3_64(shard_key);
let shard_id = (hash % self.shard_count as u64) as u32;
&self.shards[&shard_id]
}
async fn query_single_shard<T: sqlx::FromRow<'static, sqlx::postgres::PgRow>>(
&self,
shard_key: &[u8],
query: &str,
params: &[&(dyn sqlx::Encode<'_, sqlx::Postgres> + Sync)],
) -> anyhow::Result<Option<T>> {
let pool = self.get_shard(shard_key);
let row = sqlx::query_as::<_, T>(query)
.fetch_optional(pool)
.await?;
Ok(row)
}
}
Usage
let router = ShardRouter::new(vec![
"postgres://shard1:5432/db".into(),
"postgres://shard2:5432/db".into(),
"postgres://shard3:5432/db".into(),
]).await?;
// Route by user address
let user_address = b"0x742d35Cc6634C0532925a3b844Bc9e7595f2bD38";
let pool = router.get_shard(user_address);
let txs = sqlx::query_as::<_, Transaction>(
"SELECT * FROM transactions WHERE from_addr = $1 ORDER BY block_number DESC LIMIT 100"
)
.bind(user_address)
.fetch_all(pool)
.await?;
Consistent Hashing
Simple modulo hashing (hash % shard_count) breaks when you add or remove shards — almost all keys remap. Consistent hashing minimizes key redistribution:
use std::collections::BTreeMap;
struct ConsistentHashRing {
ring: BTreeMap<u64, u32>, // hash -> shard_id
virtual_nodes: u32,
shards: HashMap<u32, PgPool>,
}
impl ConsistentHashRing {
fn new(virtual_nodes: u32) -> Self {
Self {
ring: BTreeMap::new(),
virtual_nodes,
shards: HashMap::new(),
}
}
fn add_shard(&mut self, shard_id: u32, pool: PgPool) {
for i in 0..self.virtual_nodes {
let key = format!("shard-{}-vnode-{}", shard_id, i);
let hash = xxhash_rust::xxh3::xxh3_64(key.as_bytes());
self.ring.insert(hash, shard_id);
}
self.shards.insert(shard_id, pool);
}
fn remove_shard(&mut self, shard_id: u32) {
for i in 0..self.virtual_nodes {
let key = format!("shard-{}-vnode-{}", shard_id, i);
let hash = xxhash_rust::xxh3::xxh3_64(key.as_bytes());
self.ring.remove(&hash);
}
self.shards.remove(&shard_id);
}
fn get_shard(&self, key: &[u8]) -> &PgPool {
let hash = xxhash_rust::xxh3::xxh3_64(key);
// Find the first node clockwise from the hash
let shard_id = self.ring
.range(hash..)
.next()
.or_else(|| self.ring.iter().next()) // Wrap around
.map(|(_, &id)| id)
.expect("Ring is empty");
&self.shards[&shard_id]
}
}
With consistent hashing and 256 virtual nodes per shard, adding a new shard only remaps approximately 1/N of the keys (where N is the total number of shards), instead of nearly all keys with modulo hashing.
Cross-Shard Queries
The hardest problem in sharding is queries that span multiple shards:
impl ShardRouter {
/// Query all shards in parallel and merge results
async fn query_all_shards<T>(
&self,
query: &str,
) -> anyhow::Result<Vec<T>>
where
T: sqlx::FromRow<'static, sqlx::postgres::PgRow> + Send + 'static,
{
let mut handles = Vec::new();
for pool in self.shards.values() {
let pool = pool.clone();
let query = query.to_string();
let handle = tokio::spawn(async move {
sqlx::query_as::<_, T>(&query)
.fetch_all(&pool)
.await
});
handles.push(handle);
}
let mut results = Vec::new();
for handle in handles {
let shard_results = handle.await??;
results.extend(shard_results);
}
Ok(results)
}
/// Cross-shard aggregation
async fn count_all_shards(
&self,
query: &str,
) -> anyhow::Result<i64> {
let mut handles = Vec::new();
for pool in self.shards.values() {
let pool = pool.clone();
let query = query.to_string();
let handle = tokio::spawn(async move {
let row: (i64,) = sqlx::query_as(&query)
.fetch_one(&pool)
.await?;
Ok::<i64, anyhow::Error>(row.0)
});
handles.push(handle);
}
let mut total: i64 = 0;
for handle in handles {
total += handle.await??;
}
Ok(total)
}
}
Cross-shard queries are expensive: they hit every shard, transfer data over the network, and merge results in memory. Design your shard key to minimize cross-shard queries.
Choosing a Shard Key
| Shard Key | Pros | Cons |
|---|---|---|
| User address | User queries are single-shard | Cross-user analytics hit all shards |
| Block number | Block range queries are single-shard | User history spans all shards |
| Chain ID | Chain-specific queries are fast | Multi-chain aggregations are slow |
| Hash of TX | Even distribution | Every query hits random shard |
The best shard key matches your most common access pattern. For a blockchain explorer, sharding by address makes sense: most queries are “show me transactions for this address.”
Resharding Strategies
When you need to add shards to an existing cluster:
Double-Write Migration
async fn resharding_migration(
old_router: &ShardRouter,
new_router: &ShardRouter,
) -> anyhow::Result<()> {
// Phase 1: Start double-writing to both old and new shards
tracing::info!("Phase 1: Double-write enabled");
// Phase 2: Backfill old data to new shards
for (shard_id, pool) in &old_router.shards {
let mut cursor: i64 = 0;
loop {
let batch: Vec<Transaction> = sqlx::query_as(
"SELECT * FROM transactions WHERE id > $1 ORDER BY id LIMIT 10000"
)
.bind(cursor)
.fetch_all(pool)
.await?;
if batch.is_empty() { break; }
cursor = batch.last().unwrap().id;
for tx in &batch {
let new_pool = new_router.get_shard(&tx.from_addr);
sqlx::query(
"INSERT INTO transactions (id, from_addr, to_addr, value_wei, block_number)
VALUES ($1, $2, $3, $4, $5)
ON CONFLICT (id) DO NOTHING"
)
.bind(tx.id)
.bind(&tx.from_addr)
.bind(&tx.to_addr)
.bind(&tx.value_wei)
.bind(tx.block_number)
.execute(new_pool)
.await?;
}
tracing::info!(shard = shard_id, cursor, "Backfill progress");
}
}
// Phase 3: Switch reads to new shards
tracing::info!("Phase 3: Reads switched to new cluster");
// Phase 4: Stop writes to old shards
tracing::info!("Phase 4: Old cluster decommissioned");
Ok(())
}
PostgreSQL Native Partitioning vs Application Sharding
| Feature | PG Partitioning | Application Sharding |
|---|---|---|
| Routing | Transparent (PG handles it) | Application code |
| Cross-partition joins | Native SQL | Manual merge |
| Transactions | Full ACID | Distributed (2PC or saga) |
| Max data size | Single server limits | Unlimited |
| Operational complexity | Low | High |
| Foreign keys | Limited | None |
| Scaling dimension | Storage, query performance | Everything |
Hybrid Approach: Partition Within Shards
The most scalable architecture combines both:
Application Sharding (by address prefix)
+------------------+ +------------------+ +------------------+
| Shard 0 (0x0-5) | | Shard 1 (0x6-a) | | Shard 2 (0xb-f) |
| +---------+ | | +---------+ | | +---------+ |
| | Part Q1 | | | | Part Q1 | | | | Part Q1 | |
| | Part Q2 | | | | Part Q2 | | | | Part Q2 | |
| | Part Q3 | | | | Part Q3 | | | | Part Q3 | |
| | Part Q4 | | | | Part Q4 | | | | Part Q4 | |
| +---------+ | | +---------+ | | +---------+ |
+------------------+ +------------------+ +------------------+
Sharding handles horizontal scaling across machines. Within each shard, partitioning handles query performance and maintenance efficiency.
Conclusion
Start with PostgreSQL native partitioning — it is transparent, supports joins, and has minimal operational overhead. Graduate to application-level sharding only when a single server cannot handle your write throughput, data volume, or query load. Use consistent hashing to minimize data redistribution when adding shards, design your shard key to match your primary access pattern, and expect cross-shard queries to be expensive. For maximum scale, combine sharding across servers with partitioning within each shard.