EngineeringMar 28, 2026

Deep EVM #14: Building an Arbitrage Cycle Finder — DFS on a Pool Graph

Engineering Team

The Token-Pool Graph

An arbitrage cycle is a sequence of swaps that starts and ends with the same token, producing more output than input. To find these cycles systematically, we model the DeFi ecosystem as a directed graph:

Nodes = tokens (WETH, USDC, DAI, WBTC, …)
Edges = liquidity pools that enable swaps between two tokens

A Uniswap V2 pair (WETH/USDC) creates two directed edges: WETH→USDC and USDC→WETH. A Uniswap V3 pool does the same, but with different pricing (concentrated liquidity). A Balancer weighted pool with 3 tokens creates 6 directed edges (every pair in both directions).

use std::collections::HashMap;

#[derive(Clone, Debug)]
pub struct Pool {
    pub address: Address,
    pub token0: Address,
    pub token1: Address,
    pub protocol: Protocol,
    pub fee: u32,        // basis points
    pub reserve0: U256,
    pub reserve1: U256,
}

#[derive(Clone, Debug)]
pub enum Protocol {
    UniswapV2,
    UniswapV3,
    SushiSwap,
    Curve,
    Balancer,
}

pub struct TokenGraph {
    // token_address → vec of (other_token, pool)
    pub adjacency: HashMap<Address, Vec<(Address, Pool)>>,
}

impl TokenGraph {
    pub fn new() -> Self {
        Self {
            adjacency: HashMap::new(),
        }
    }

    pub fn add_pool(&mut self, pool: Pool) {
        self.adjacency
            .entry(pool.token0)
            .or_default()
            .push((pool.token1, pool.clone()));
        self.adjacency
            .entry(pool.token1)
            .or_default()
            .push((pool.token0, pool));
    }
}

On Ethereum mainnet, there are ~80,000 active liquidity pools. The graph has ~15,000 unique tokens and ~160,000 directed edges.

DFS Cycle Detection

We find arbitrage cycles using depth-first search starting from a “base token” (typically WETH, since we want profit denominated in ETH).

#[derive(Clone, Debug)]
pub struct Cycle {
    pub pools: Vec<Pool>,
    pub tokens: Vec<Address>,  // token path: [WETH, USDC, DAI, WETH]
    pub id: [u8; 32],         // keccak256 hash for deduplication
}

pub fn find_cycles(
    graph: &TokenGraph,
    start_token: Address,
    max_depth: usize,   // typically 2-4 hops
) -> Vec<Cycle> {
    let mut results = Vec::new();
    let mut path_tokens = vec![start_token];
    let mut path_pools: Vec<Pool> = Vec::new();
    let mut visited = HashSet::new();
    visited.insert(start_token);

    dfs(
        graph,
        start_token,
        start_token,
        max_depth,
        &mut path_tokens,
        &mut path_pools,
        &mut visited,
        &mut results,
    );

    results
}

fn dfs(
    graph: &TokenGraph,
    current: Address,
    start: Address,
    max_depth: usize,
    path_tokens: &mut Vec<Address>,
    path_pools: &mut Vec<Pool>,
    visited: &mut HashSet<Address>,
    results: &mut Vec<Cycle>,
) {
    if path_pools.len() >= max_depth {
        return;
    }

    let neighbors = match graph.adjacency.get(&current) {
        Some(n) => n,
        None => return,
    };

    for (next_token, pool) in neighbors {
        // If we've returned to start with >= 2 hops, we found a cycle
        if *next_token == start && path_pools.len() >= 2 {
            let mut cycle_tokens = path_tokens.clone();
            cycle_tokens.push(start);
            let mut cycle_pools = path_pools.clone();
            cycle_pools.push(pool.clone());

            let id = compute_cycle_id(&cycle_pools);
            results.push(Cycle {
                pools: cycle_pools,
                tokens: cycle_tokens,
                id,
            });
            continue;
        }

        // Don't revisit tokens (except start)
        if visited.contains(next_token) {
            continue;
        }

        visited.insert(*next_token);
        path_tokens.push(*next_token);
        path_pools.push(pool.clone());

        dfs(
            graph,
            *next_token,
            start,
            max_depth,
            path_tokens,
            path_pools,
            visited,
            results,
        );

        path_tokens.pop();
        path_pools.pop();
        visited.remove(next_token);
    }
}

The `max_depth` Parameter

The depth parameter controls cycle length (number of hops):

depth=2: Direct arbitrage. WETH→USDC→WETH (two pools). Simple, competitive.
depth=3: Triangular arbitrage. WETH→USDC→DAI→WETH. The sweet spot — complex enough that fewer searchers compete, simple enough to simulate quickly.
depth=4: Quadrilateral arbitrage. Higher potential profit but exponentially more cycles to evaluate and higher gas costs.
depth=5+: Rarely profitable after gas costs. The search space explodes combinatorially.

Pools: 80,000
Depth 2 cycles: ~50,000
Depth 3 cycles: ~15,000,000
Depth 4 cycles: ~2,000,000,000+

In practice, depth 3 is the production sweet spot. We find ~15 million triangular cycles and filter them down to ~100,000 that are worth simulating.

Deduplication with Keccak256

The DFS produces duplicate cycles: WETH→USDC→DAI→WETH and WETH→DAI→USDC→WETH are the same cycle traversed in opposite directions. We deduplicate using a canonical hash:

use tiny_keccak::{Hasher, Keccak};

fn compute_cycle_id(pools: &[Pool]) -> [u8; 32] {
    // Sort pool addresses to create a canonical representation
    let mut addresses: Vec<Address> = pools.iter().map(|p| p.address).collect();
    addresses.sort();

    let mut hasher = Keccak::v256();
    for addr in &addresses {
        hasher.update(addr.as_bytes());
    }
    let mut output = [0u8; 32];
    hasher.finalize(&mut output);
    output
}

By sorting pool addresses before hashing, both directions of the same cycle produce the same ID. Insert into a HashSet<[u8; 32]> and skip duplicates.

Why keccak256 instead of a simpler hash? In a set of 15 million cycles, collision probability with a 128-bit hash is non-trivial. With 256 bits, collisions are astronomically unlikely.

Filtering: Reducing Millions to Thousands

Not all cycles are worth simulating. Apply filters:

Liquidity Filter

Discard cycles containing pools with reserves below a threshold. A pool with $100 of liquidity cannot produce meaningful arbitrage.

fn has_sufficient_liquidity(cycle: &Cycle, min_reserve_usd: f64) -> bool {
    cycle.pools.iter().all(|pool| {
        let reserve_usd = estimate_reserve_usd(pool);
        reserve_usd >= min_reserve_usd
    })
}

Staleness Filter

Discard cycles where all pools have been idle (no swaps) for > N blocks. Stale pools rarely have arbitrage.

Token Blacklist

Exclude scam tokens, fee-on-transfer tokens, and tokens with transfer restrictions. These look profitable in simulation but fail on-chain.

const BLACKLISTED_TOKENS: &[&str] = &[
    "0x...",  // known fee-on-transfer
    "0x...",  // known honeypot
];

fn contains_blacklisted_token(cycle: &Cycle) -> bool {
    cycle.tokens.iter().any(|t| {
        BLACKLISTED_TOKENS.contains(&format!("{:?}", t).as_str())
    })
}

Static Profitability Estimate

Before full simulation, compute a rough profitability estimate using constant-product formula:

fn estimate_output_v2(
    amount_in: U256,
    reserve_in: U256,
    reserve_out: U256,
    fee_bps: u32,
) -> U256 {
    let fee_factor = 10000 - fee_bps;  // e.g., 9970 for 0.3% fee
    let numerator = amount_in * reserve_out * U256::from(fee_factor);
    let denominator = reserve_in * U256::from(10000) + amount_in * U256::from(fee_factor);
    numerator / denominator
}

fn estimate_cycle_profit(
    cycle: &Cycle,
    input_amount: U256,
) -> Option<U256> {
    let mut current_amount = input_amount;

    for (i, pool) in cycle.pools.iter().enumerate() {
        let token_in = cycle.tokens[i];
        let (reserve_in, reserve_out) = if token_in == pool.token0 {
            (pool.reserve0, pool.reserve1)
        } else {
            (pool.reserve1, pool.reserve0)
        };

        current_amount = estimate_output_v2(
            current_amount,
            reserve_in,
            reserve_out,
            pool.fee,
        );
    }

    if current_amount > input_amount {
        Some(current_amount - input_amount)
    } else {
        None
    }
}

This estimate is imperfect (ignores Uniswap V3 tick math, Curve stableswap curves, etc.) but eliminates 95% of unprofitable cycles before expensive simulation.

Scaling: Parallelism and Incremental Updates

Building the full graph and running DFS on 80,000 pools takes 5-10 seconds on a single core. For a production bot, this is fine as a startup cost. But pool state changes every block (12 seconds), so we need incremental updates:

pub struct CycleIndex {
    pub cycles_by_pool: HashMap<Address, Vec<CycleId>>,
    pub all_cycles: HashMap<CycleId, Cycle>,
}

impl CycleIndex {
    /// When a pool's reserves change, invalidate and re-evaluate
    /// only the cycles that include this pool.
    pub fn on_pool_update(&self, pool_address: Address) -> Vec<&Cycle> {
        self.cycles_by_pool
            .get(&pool_address)
            .map(|ids| {
                ids.iter()
                    .filter_map(|id| self.all_cycles.get(id))
                    .collect()
            })
            .unwrap_or_default()
    }
}

When a Sync event fires on a Uniswap V2 pool, we update that pool’s reserves and re-evaluate only the cycles containing that pool. This reduces per-block work from millions of cycles to a few hundred.

For the DFS itself, we parallelize across base tokens using Rayon:

use rayon::prelude::*;

let base_tokens = vec![weth, usdc, usdt, dai, wbtc];

let all_cycles: Vec<Cycle> = base_tokens
    .par_iter()
    .flat_map(|base| find_cycles(&graph, *base, 3))
    .collect();

Summary

The cycle finder is the first stage of the MEV pipeline. It transforms a flat list of liquidity pools into a structured set of arbitrage opportunities. The key decisions — graph representation, DFS depth, deduplication strategy, and filtering heuristics — determine the quality and coverage of your opportunity set. In the next article, we will simulate these cycles: binary search over borrow amounts, state forks, and racing against the 12-second block deadline.