Deep EVM #7: Gas-Efficient Loops and Conditionals in Yul
Engineering Team
Why Loops Are Where Gas Goes to Die
In most smart contracts, the single largest gas consumer is iteration. A function that loops over an array of 100 elements executes its body 100 times — and every opcode inside that body is multiplied by 100. A single unnecessary SLOAD inside a loop costs 10,000-210,000 gas. An unoptimized loop counter adds 4,000 gas over 100 iterations.
This article breaks down the exact gas cost of loop constructs in Yul and Solidity, and shows patterns that cut loop gas by 30-60%.
Anatomy of a Yul For-Loop
for { let i := 0 } lt(i, 10) { i := add(i, 1) } {
// body
}
This compiles to the following opcode sequence:
// Initialization: let i := 0
PUSH1 0x00 // 3 gas (executed once)
// Condition check: lt(i, 10)
JUMPDEST // 1 gas (loop entry point)
DUP1 // 3 gas
PUSH1 0x0a // 3 gas
LT // 3 gas
PUSH2 <exit> // 3 gas
JUMPI // 10 gas (conditional jump)
// Body: (varies)
...
// Post-iteration: i := add(i, 1)
PUSH1 0x01 // 3 gas
ADD // 3 gas
// Jump back to condition
PUSH2 <loop_start> // 3 gas
JUMP // 8 gas
Per iteration, the loop overhead is:
- Condition check: 1 + 3 + 3 + 3 + 3 + 10 = 23 gas
- Post-iteration: 3 + 3 = 6 gas
- Jump back: 3 + 8 = 11 gas
- Total overhead per iteration: 40 gas
For 100 iterations, that is 4,000 gas just in loop mechanics — before any work in the body.
Solidity Loop vs Yul Loop: A Benchmark
Let us compare summing a uint256[] calldata array:
Solidity (Checked Arithmetic)
function sumSolidity(uint256[] calldata arr) external pure returns (uint256 total) {
for (uint256 i = 0; i < arr.length; i++) {
total += arr[i];
}
}
Gas per iteration:
- Loop overhead: ~60 gas (Solidity adds overflow checks on
i++andtotal +=) - CALLDATALOAD: 3 gas
- Offset computation: ~10 gas
- Total: ~73 gas/iteration
Solidity (Unchecked)
function sumUnchecked(uint256[] calldata arr) external pure returns (uint256 total) {
uint256 len = arr.length;
for (uint256 i = 0; i < len;) {
total += arr[i];
unchecked { ++i; }
}
}
Gas per iteration:
- Loop overhead: ~40 gas (no overflow checks)
- CALLDATALOAD: 3 gas
- Offset computation: ~10 gas
- Total: ~53 gas/iteration
Yul
function sumYul(offset, length) -> total {
let end := add(offset, mul(length, 0x20))
for { } lt(offset, end) { offset := add(offset, 0x20) } {
total := add(total, calldataload(offset))
}
}
Gas per iteration:
- Condition (lt): 1 + 3 + 3 + 3 + 10 = 20 gas
- Body (add + calldataload): 3 + 3 = 6 gas
- Post (add offset): 3 + 3 = 6 gas
- Jump back: 11 gas
- Total: ~43 gas/iteration
Benchmark Results (100 elements)
| Method | Gas per iteration | Total (100 elements) | Savings vs Solidity |
|---|---|---|---|
| Solidity (checked) | ~73 | 7,300 | baseline |
| Solidity (unchecked) | ~53 | 5,300 | 27% |
| Yul | ~43 | 4,300 | 41% |
The savings come from three sources: no overflow checks, direct calldata offset arithmetic (instead of Solidity’s bounds checking), and tighter loop structure.
Switch vs If: Choosing the Right Conditional
Yul provides two conditional constructs. Use the right one.
If (Single Condition)
if iszero(value) {
revert(0, 0)
}
Compiles to:
DUP1 // 3 gas
ISZERO // 3 gas
PUSH2 <skip> // 3 gas
JUMPI // 10 gas
// ... revert ...
JUMPDEST // 1 gas
Total: 20 gas for the condition check.
Switch (Multiple Conditions)
switch selector
case 0xa9059cbb { transferImpl() } // transfer
case 0x70a08231 { balanceOfImpl() } // balanceOf
case 0x18160ddd { totalSupplyImpl() } // totalSupply
default { revert(0, 0) }
The Yul compiler generates sequential comparisons:
DUP1 PUSH4 0xa9059cbb EQ PUSH2 <case1> JUMPI // 22 gas
DUP1 PUSH4 0x70a08231 EQ PUSH2 <case2> JUMPI // 22 gas
DUP1 PUSH4 0x18160ddd EQ PUSH2 <case3> JUMPI // 22 gas
// default: revert
Each case costs 22 gas to check. For N cases, the worst case is N * 22 gas. Solidity optimizes this with a binary search tree for large function counts, but in Yul you must implement that yourself if needed.
Optimization: Order cases by frequency. Put the most-called function selector first:
// If 80% of calls are transfers, check it first
switch selector
case 0xa9059cbb { transferImpl() } // Most common: checked first
case 0x70a08231 { balanceOfImpl() } // Second most common
default { revert(0, 0) }
Unchecked Arithmetic in Yul
All arithmetic in Yul is unchecked by default. There are no overflow or underflow reverts. This is both a feature (performance) and a danger (bugs).
// These all silently wrap:
let a := add(0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff, 1) // a = 0
let b := sub(0, 1) // b = 2^256 - 1
let c := mul(0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff, 2) // c = 2^256 - 2
When is unchecked arithmetic safe?
- Loop counters — If the bound is known to be < 2^256 (always true in practice)
- Array index calculation —
offset + i * 32where i < array.length and array.length fits in uint256 - Token amounts with bounded totals — If totalSupply <= 2^128, no addition of two balances can overflow
- Subtraction after comparison —
if gt(a, b) { let diff := sub(a, b) }is always safe
When you need overflow checking in Yul:
function safeAdd(a, b) -> c {
c := add(a, b)
if lt(c, a) { revert(0, 0) } // overflow check
}
function safeSub(a, b) -> c {
if lt(a, b) { revert(0, 0) } // underflow check
c := sub(a, b)
}
function safeMul(a, b) -> c {
if and(b, gt(a, div(not(0), b))) { revert(0, 0) }
c := mul(a, b)
}
Loop Unrolling
Loop unrolling reduces the overhead per element by processing multiple elements per iteration:
// Standard loop: 40 gas overhead per iteration
function sum(offset, length) -> total {
let end := add(offset, mul(length, 0x20))
for { } lt(offset, end) { offset := add(offset, 0x20) } {
total := add(total, calldataload(offset))
}
}
// Unrolled 4x: ~13 gas overhead per element
function sumUnrolled(offset, length) -> total {
let end := add(offset, mul(length, 0x20))
// Process 4 elements at a time
let endAligned := sub(end, mod(mul(length, 0x20), 0x80))
for { } lt(offset, endAligned) { offset := add(offset, 0x80) } {
total := add(total, calldataload(offset))
total := add(total, calldataload(add(offset, 0x20)))
total := add(total, calldataload(add(offset, 0x40)))
total := add(total, calldataload(add(offset, 0x60)))
}
// Handle remaining elements
for { } lt(offset, end) { offset := add(offset, 0x20) } {
total := add(total, calldataload(offset))
}
}
The unrolled version amortizes the loop overhead (condition check + jump) across 4 elements. In practice, this saves 10-20% gas for large arrays.
Minimizing Stack Depth in Loops
The EVM stack is limited, and deep stacks mean more SWAP/DUP operations. Keep loop bodies shallow:
// BAD: many live variables in loop body
for { let i := 0 } lt(i, len) { i := add(i, 1) } {
let a := calldataload(add(offset, mul(i, 0x20)))
let b := sload(add(baseSlot, i))
let c := add(a, b)
let d := mul(c, price)
let e := div(d, 1000)
total := add(total, e)
}
// BETTER: use helper function to reduce stack depth
function processElement(offset, slot, price) -> result {
let a := calldataload(offset)
let b := sload(slot)
result := div(mul(add(a, b), price), 1000)
}
for { let i := 0 } lt(i, len) { i := add(i, 1) } {
total := add(total, processElement(
add(offset, mul(i, 0x20)),
add(baseSlot, i),
price
))
}
Early Exit Patterns
When searching for an element, exit the loop as soon as you find it:
// Find the index of 'target' in a calldata array
function indexOf(offset, length, target) -> index, found {
let end := add(offset, mul(length, 0x20))
for { index := 0 } lt(offset, end) { offset := add(offset, 0x20) } {
if eq(calldataload(offset), target) {
found := 1
// Yul has no 'break' — use a function return instead
leave // exits the function immediately
}
index := add(index, 1)
}
}
The leave statement is Yul’s equivalent of return — it exits the current function immediately. Use it for early exits in search loops.
Bit Manipulation as Loop Alternative
Sometimes you can replace a loop with bitwise operations:
// Count the number of set bits (population count)
// Loop approach: up to 256 iterations
function popcountLoop(x) -> count {
for { } x { x := shr(1, x) } {
count := add(count, and(x, 1))
}
}
// Bitwise approach: always 5 steps (Brian Kernighan's algorithm)
function popcountFast(x) -> count {
for { } x { } {
x := and(x, sub(x, 1)) // clear lowest set bit
count := add(count, 1)
}
}
// Even faster: parallel bit counting (no loop)
function popcountParallel(x) -> count {
x := sub(x, and(shr(1, x), 0x5555555555555555555555555555555555555555555555555555555555555555))
x := add(and(x, 0x3333333333333333333333333333333333333333333333333333333333333333),
and(shr(2, x), 0x3333333333333333333333333333333333333333333333333333333333333333))
x := and(add(x, shr(4, x)), 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f)
count := mod(mul(x, 0x0101010101010101010101010101010101010101010101010101010101010101), exp(2, 248))
}
Practical Example: Gas-Efficient Array Comparison
Compare two calldata arrays for equality:
function arraysEqual(offset1, offset2, length) -> equal {
equal := 1
let end := add(offset1, mul(length, 0x20))
for { } lt(offset1, end) { } {
if iszero(eq(calldataload(offset1), calldataload(offset2))) {
equal := 0
leave
}
offset1 := add(offset1, 0x20)
offset2 := add(offset2, 0x20)
}
}
This exits on the first mismatch. For MEV applications, this pattern is used to verify expected pool states before executing a trade.
Gas Cost Summary: Loop Patterns
| Pattern | Gas per iteration (approximate) |
|---|---|
| Solidity for (checked) | 73 |
| Solidity for (unchecked) | 53 |
| Yul for | 43 |
| Yul for (unrolled 4x) | 35 |
| Yul while (sentinel) | 40 |
The absolute numbers depend on the loop body, but the relative savings are consistent: Yul loops save 30-50% over standard Solidity.
Conclusion
Loop optimization in Yul is not about clever tricks — it is about understanding the exact gas cost of every opcode in your loop body and systematically eliminating waste. Cache storage reads, unroll tight loops, exit early, and prefer bitwise operations over iteration where possible. These patterns compound: a 30% loop savings in a function called 1000 times per block is 30% more profit for your MEV bot.
In the final article, we put everything together and build a complete token swap contract in pure Yul.