Deep EVM #14: Building an Arbitrage Cycle Finder — DFS on a Pool Graph
Engineering Team
The Token-Pool Graph
An arbitrage cycle is a sequence of swaps that starts and ends with the same token, producing more output than input. To find these cycles systematically, we model the DeFi ecosystem as a directed graph:
- Nodes = tokens (WETH, USDC, DAI, WBTC, …)
- Edges = liquidity pools that enable swaps between two tokens
A Uniswap V2 pair (WETH/USDC) creates two directed edges: WETH→USDC and USDC→WETH. A Uniswap V3 pool does the same, but with different pricing (concentrated liquidity). A Balancer weighted pool with 3 tokens creates 6 directed edges (every pair in both directions).
use std::collections::HashMap;
#[derive(Clone, Debug)]
pub struct Pool {
pub address: Address,
pub token0: Address,
pub token1: Address,
pub protocol: Protocol,
pub fee: u32, // basis points
pub reserve0: U256,
pub reserve1: U256,
}
#[derive(Clone, Debug)]
pub enum Protocol {
UniswapV2,
UniswapV3,
SushiSwap,
Curve,
Balancer,
}
pub struct TokenGraph {
// token_address → vec of (other_token, pool)
pub adjacency: HashMap<Address, Vec<(Address, Pool)>>,
}
impl TokenGraph {
pub fn new() -> Self {
Self {
adjacency: HashMap::new(),
}
}
pub fn add_pool(&mut self, pool: Pool) {
self.adjacency
.entry(pool.token0)
.or_default()
.push((pool.token1, pool.clone()));
self.adjacency
.entry(pool.token1)
.or_default()
.push((pool.token0, pool));
}
}
On Ethereum mainnet, there are ~80,000 active liquidity pools. The graph has ~15,000 unique tokens and ~160,000 directed edges.
DFS Cycle Detection
We find arbitrage cycles using depth-first search starting from a “base token” (typically WETH, since we want profit denominated in ETH).
#[derive(Clone, Debug)]
pub struct Cycle {
pub pools: Vec<Pool>,
pub tokens: Vec<Address>, // token path: [WETH, USDC, DAI, WETH]
pub id: [u8; 32], // keccak256 hash for deduplication
}
pub fn find_cycles(
graph: &TokenGraph,
start_token: Address,
max_depth: usize, // typically 2-4 hops
) -> Vec<Cycle> {
let mut results = Vec::new();
let mut path_tokens = vec![start_token];
let mut path_pools: Vec<Pool> = Vec::new();
let mut visited = HashSet::new();
visited.insert(start_token);
dfs(
graph,
start_token,
start_token,
max_depth,
&mut path_tokens,
&mut path_pools,
&mut visited,
&mut results,
);
results
}
fn dfs(
graph: &TokenGraph,
current: Address,
start: Address,
max_depth: usize,
path_tokens: &mut Vec<Address>,
path_pools: &mut Vec<Pool>,
visited: &mut HashSet<Address>,
results: &mut Vec<Cycle>,
) {
if path_pools.len() >= max_depth {
return;
}
let neighbors = match graph.adjacency.get(¤t) {
Some(n) => n,
None => return,
};
for (next_token, pool) in neighbors {
// If we've returned to start with >= 2 hops, we found a cycle
if *next_token == start && path_pools.len() >= 2 {
let mut cycle_tokens = path_tokens.clone();
cycle_tokens.push(start);
let mut cycle_pools = path_pools.clone();
cycle_pools.push(pool.clone());
let id = compute_cycle_id(&cycle_pools);
results.push(Cycle {
pools: cycle_pools,
tokens: cycle_tokens,
id,
});
continue;
}
// Don't revisit tokens (except start)
if visited.contains(next_token) {
continue;
}
visited.insert(*next_token);
path_tokens.push(*next_token);
path_pools.push(pool.clone());
dfs(
graph,
*next_token,
start,
max_depth,
path_tokens,
path_pools,
visited,
results,
);
path_tokens.pop();
path_pools.pop();
visited.remove(next_token);
}
}
The max_depth Parameter
The depth parameter controls cycle length (number of hops):
- depth=2: Direct arbitrage. WETH→USDC→WETH (two pools). Simple, competitive.
- depth=3: Triangular arbitrage. WETH→USDC→DAI→WETH. The sweet spot — complex enough that fewer searchers compete, simple enough to simulate quickly.
- depth=4: Quadrilateral arbitrage. Higher potential profit but exponentially more cycles to evaluate and higher gas costs.
- depth=5+: Rarely profitable after gas costs. The search space explodes combinatorially.
Pools: 80,000
Depth 2 cycles: ~50,000
Depth 3 cycles: ~15,000,000
Depth 4 cycles: ~2,000,000,000+
In practice, depth 3 is the production sweet spot. We find ~15 million triangular cycles and filter them down to ~100,000 that are worth simulating.
Deduplication with Keccak256
The DFS produces duplicate cycles: WETH→USDC→DAI→WETH and WETH→DAI→USDC→WETH are the same cycle traversed in opposite directions. We deduplicate using a canonical hash:
use tiny_keccak::{Hasher, Keccak};
fn compute_cycle_id(pools: &[Pool]) -> [u8; 32] {
// Sort pool addresses to create a canonical representation
let mut addresses: Vec<Address> = pools.iter().map(|p| p.address).collect();
addresses.sort();
let mut hasher = Keccak::v256();
for addr in &addresses {
hasher.update(addr.as_bytes());
}
let mut output = [0u8; 32];
hasher.finalize(&mut output);
output
}
By sorting pool addresses before hashing, both directions of the same cycle produce the same ID. Insert into a HashSet<[u8; 32]> and skip duplicates.
Why keccak256 instead of a simpler hash? In a set of 15 million cycles, collision probability with a 128-bit hash is non-trivial. With 256 bits, collisions are astronomically unlikely.
Filtering: Reducing Millions to Thousands
Not all cycles are worth simulating. Apply filters:
Liquidity Filter
Discard cycles containing pools with reserves below a threshold. A pool with $100 of liquidity cannot produce meaningful arbitrage.
fn has_sufficient_liquidity(cycle: &Cycle, min_reserve_usd: f64) -> bool {
cycle.pools.iter().all(|pool| {
let reserve_usd = estimate_reserve_usd(pool);
reserve_usd >= min_reserve_usd
})
}
Staleness Filter
Discard cycles where all pools have been idle (no swaps) for > N blocks. Stale pools rarely have arbitrage.
Token Blacklist
Exclude scam tokens, fee-on-transfer tokens, and tokens with transfer restrictions. These look profitable in simulation but fail on-chain.
const BLACKLISTED_TOKENS: &[&str] = &[
"0x...", // known fee-on-transfer
"0x...", // known honeypot
];
fn contains_blacklisted_token(cycle: &Cycle) -> bool {
cycle.tokens.iter().any(|t| {
BLACKLISTED_TOKENS.contains(&format!("{:?}", t).as_str())
})
}
Static Profitability Estimate
Before full simulation, compute a rough profitability estimate using constant-product formula:
fn estimate_output_v2(
amount_in: U256,
reserve_in: U256,
reserve_out: U256,
fee_bps: u32,
) -> U256 {
let fee_factor = 10000 - fee_bps; // e.g., 9970 for 0.3% fee
let numerator = amount_in * reserve_out * U256::from(fee_factor);
let denominator = reserve_in * U256::from(10000) + amount_in * U256::from(fee_factor);
numerator / denominator
}
fn estimate_cycle_profit(
cycle: &Cycle,
input_amount: U256,
) -> Option<U256> {
let mut current_amount = input_amount;
for (i, pool) in cycle.pools.iter().enumerate() {
let token_in = cycle.tokens[i];
let (reserve_in, reserve_out) = if token_in == pool.token0 {
(pool.reserve0, pool.reserve1)
} else {
(pool.reserve1, pool.reserve0)
};
current_amount = estimate_output_v2(
current_amount,
reserve_in,
reserve_out,
pool.fee,
);
}
if current_amount > input_amount {
Some(current_amount - input_amount)
} else {
None
}
}
This estimate is imperfect (ignores Uniswap V3 tick math, Curve stableswap curves, etc.) but eliminates 95% of unprofitable cycles before expensive simulation.
Scaling: Parallelism and Incremental Updates
Building the full graph and running DFS on 80,000 pools takes 5-10 seconds on a single core. For a production bot, this is fine as a startup cost. But pool state changes every block (12 seconds), so we need incremental updates:
pub struct CycleIndex {
pub cycles_by_pool: HashMap<Address, Vec<CycleId>>,
pub all_cycles: HashMap<CycleId, Cycle>,
}
impl CycleIndex {
/// When a pool's reserves change, invalidate and re-evaluate
/// only the cycles that include this pool.
pub fn on_pool_update(&self, pool_address: Address) -> Vec<&Cycle> {
self.cycles_by_pool
.get(&pool_address)
.map(|ids| {
ids.iter()
.filter_map(|id| self.all_cycles.get(id))
.collect()
})
.unwrap_or_default()
}
}
When a Sync event fires on a Uniswap V2 pool, we update that pool’s reserves and re-evaluate only the cycles containing that pool. This reduces per-block work from millions of cycles to a few hundred.
For the DFS itself, we parallelize across base tokens using Rayon:
use rayon::prelude::*;
let base_tokens = vec![weth, usdc, usdt, dai, wbtc];
let all_cycles: Vec<Cycle> = base_tokens
.par_iter()
.flat_map(|base| find_cycles(&graph, *base, 3))
.collect();
Summary
The cycle finder is the first stage of the MEV pipeline. It transforms a flat list of liquidity pools into a structured set of arbitrage opportunities. The key decisions — graph representation, DFS depth, deduplication strategy, and filtering heuristics — determine the quality and coverage of your opportunity set. In the next article, we will simulate these cycles: binary search over borrow amounts, state forks, and racing against the 12-second block deadline.