Deep EVM #18: Debugging EVM Bytecode — Traces, Stack Dumps, and cast run
Engineering Team
The Debugging Challenge with Low-Level EVM Code
When a Solidity transaction reverts, you typically get a descriptive error message like ERC20: transfer amount exceeds balance. When a Huff or Yul transaction reverts, you get 0x — an empty revert payload with zero context. The contract simply hit a REVERT opcode, and it is up to you to figure out why.
Debugging at the bytecode level requires different tools and mental models. You need to think in terms of the stack machine, track memory and storage changes opcode by opcode, and understand how the EVM executes control flow through JUMP and JUMPI instructions.
This article covers the essential debugging toolkit: cast run for replaying historical transactions, forge debug for interactive step-through debugging, and manual trace analysis for understanding exactly what happened inside the EVM.
cast run: Replaying Transactions
cast run is the fastest way to debug a failed transaction. It replays the transaction against the historical state and shows you exactly what happened:
cast run 0xYOUR_TX_HASH --rpc-url https://eth-mainnet.g.alchemy.com/v2/KEY
The output shows a structured trace with call depth, gas usage, and return data:
Traces:
[328439] 0xContractAddr::transfer(0xRecipient, 1000000000000000000)
+- [2604] 0xContractAddr::balanceOf(0xSender) [staticcall]
| +- <- 500000000000000000
+- <- revert: EvmError: Revert
This immediately tells you the transfer failed because the sender had 0.5 ETH but tried to transfer 1.0 ETH. For Huff contracts, the function names will not be decoded (they appear as raw selectors), but the call structure and revert points are still visible.
Decoding Raw Selectors
When working with Huff contracts, cast run shows raw function selectors. Decode them manually:
# Compute selector for balanceOf(address)
cast sig "balanceOf(address)"
# Output: 0x70a08231
# Or decode calldata
cast 4byte-decode 0x70a0823100000000000000000000000042069abcdef
# Output: balanceOf(address)(0x42069abcdef)
Keep a reference table of your contract’s selectors when debugging Huff:
0x70a08231 -> balanceOf(address)
0xa9059cbb -> transfer(address,uint256)
0x23b872dd -> transferFrom(address,address,uint256)
0x095ea7b3 -> approve(address,uint256)
forge debug: Interactive Step-Through
forge debug provides a TUI (terminal user interface) for stepping through EVM execution opcode by opcode:
forge debug --debug test/SimpleToken.t.sol \
--sig "test_transfer()" -vvvv
The interface shows four panels:
- Opcodes — The current instruction with a cursor, showing the bytecode being executed
- Stack — The current stack state with all 32-byte words
- Memory — Raw memory contents in hex
- Storage — Storage slot changes during execution
Navigation keys:
j/k— Step forward/backwardg/G— Jump to start/endc— Continue to next call boundaryC— Continue to next testq— Quit
Reading the Stack During Debugging
The EVM stack is last-in-first-out with a maximum depth of 1024. When debugging Huff, you must track the stack mentally to understand what each opcode consumes and produces.
Consider this Huff snippet:
0x04 calldataload // Stack: [address]
BALANCES_SLOT // Stack: [slot, address]
After calldataload, the stack has the address parameter. After pushing the storage pointer, we have [slot, address]. If you see the wrong value at position 0 on the stack, you know the bug is in how the storage slot is computed.
Understanding Opcode Traces
For production debugging (when you cannot reproduce the issue locally), raw opcode traces from archive nodes are your primary tool. Services like Tenderly, Etherscan, and Alchemy provide trace APIs:
# Get trace via cast
cast run TX_HASH --rpc-url $RPC -vvvvv 2>&1 | head -200
The verbose trace format shows each opcode with gas cost and stack state:
[0] PUSH1 0x00 gas: 29234 stack: []
[2] CALLDATALOAD gas: 29231 stack: [0x00]
[3] PUSH1 0xe0 gas: 29228 stack: [0xa9059cbb...]
[5] SHR gas: 29225 stack: [0xa9059cbb..., 0xe0]
[6] DUP1 gas: 29222 stack: [0xa9059cbb]
[7] PUSH4 0x70a08231 gas: 29219 stack: [0xa9059cbb, 0xa9059cbb]
[12] EQ gas: 29216 stack: [0xa9059cbb, 0xa9059cbb, 0x70a08231]
[13] PUSH2 0x0040 gas: 29213 stack: [0xa9059cbb, 0x00]
[16] JUMPI gas: 29210 stack: [0xa9059cbb, 0x00, 0x0040]
This trace shows the function dispatcher checking if the selector matches balanceOf(address). The EQ produces 0x00 (false) because the actual selector is 0xa9059cbb (transfer), so JUMPI does not jump.
Common Debugging Patterns for Huff and Yul
Pattern 1: Stack Underflow
If execution reverts with an out-of-gas error at a seemingly cheap opcode, you likely have a stack underflow. The EVM does not have a dedicated “stack underflow” error — it just consumes all gas.
// Bug: pop when stack is empty
#define macro BROKEN() = takes (0) returns (0) {
pop // Stack underflow! No items to pop
}
Detection: In forge debug, watch the stack panel. If it shows 0 items before a consuming opcode, that is your bug.
Pattern 2: Incorrect JUMP Destination
Huff uses labels for jump destinations. If a label resolves to a non-JUMPDEST opcode, the transaction reverts:
#define macro MAIN() = takes (0) returns (0) {
0x01 success jumpi
0x00 0x00 revert
success: // Must be JUMPDEST
0x00 0x00 return
}
Detection: In the trace, look for JUMP or JUMPI followed by immediate gas exhaustion. The target PC is on top of the stack before the jump.
Pattern 3: Incorrect ABI Encoding
Huff does not auto-encode return values. If you return raw bytes without proper ABI encoding, the calling contract’s decoder will revert:
// Wrong: returning raw uint256 without offset
0x00 mstore
0x20 0x00 return
// Correct for dynamic types: include offset
0x20 0x00 mstore // offset
0x05 0x20 mstore // length
// ... data at 0x40
Detection: The calling contract’s abi.decode reverts. The trace shows a successful return from your contract but a revert in the parent context.
Pattern 4: Storage Collision
Huff uses FREE_STORAGE_POINTER() to allocate storage slots. If two macros accidentally use the same slot, they overwrite each other:
#define constant BALANCES_SLOT = FREE_STORAGE_POINTER() // slot 0
#define constant ALLOWANCES_SLOT = FREE_STORAGE_POINTER() // slot 1
#define constant TOTAL_SUPPLY_SLOT = FREE_STORAGE_POINTER() // slot 2
Detection: In forge debug, watch the storage panel. If writing to one mapping changes another variable, you have a collision.
Building a Debugging Workflow
Here is a systematic approach to debugging Huff contracts:
- Reproduce — Write a failing test in Foundry that triggers the bug
- Trace — Run with
-vvvvvto get the full opcode trace - Narrow — Identify the exact opcode where behavior diverges from expectation
- Compare — Run the same scenario against your Solidity reference implementation
- Fix — Correct the Huff macro and verify the differential test passes
- Regress — Add the failing case to your permanent test suite
# Step 1: Run the failing test
forge test --match-test test_brokenTransfer -vvvvv
# Step 2: Interactive debugging
forge debug --debug test/Token.t.sol --sig "test_brokenTransfer()"
# Step 3: After fix, verify
forge test --match-contract DifferentialTest
forge snapshot --check
Production Debugging with Tenderly
For contracts already deployed, Tenderly provides a visual debugger that shows the execution trace with decoded function calls, state changes, and gas usage:
# Export transaction for analysis
cast run TX_HASH --rpc-url $RPC --json > trace.json
# Or use Tenderly's API directly
curl -X POST "https://api.tenderly.co/api/v1/account/YOU/project/PROJ/simulate" \
-H "X-Access-Key: $TENDERLY_KEY" \
-d '{ "network_id": "1", "from": "0x...", "to": "0x...", "input": "0x..." }'
Tenderly’s visual debugger is especially useful for Huff because it annotates each opcode with its effect on the stack, letting you spot errors without manually tracking stack state.
Conclusion
Debugging EVM bytecode is a skill that separates hobbyist Huff developers from production-ready ones. Master cast run for quick transaction replay, forge debug for interactive analysis, and manual trace reading for production incidents. Build a systematic workflow: reproduce, trace, narrow, compare, fix, regress. The lower you go in the EVM stack, the more disciplined your debugging process must be.