Whether you’re starting out developing your first smart contract, or you’ve already written and deployed a few of them, there’s going to be a point where you’ll wonder how the interaction with your smart contracts is made possible. In this post we’re going to explore ABI encoding, the encoding format used to talk to an EVM (Ethereum Virtual Machine) to execute smart contract functions.
This will be a bit of a deep dive and fairly interactive, so buckle up.
I hope you’ll enjoy it!
First and foremost, we’ll be using Foundry’s cast utility to play around with encoding and decoding ABI data. If you haven’t used Foundry before, I recommend my previous guide on Smart Contract Development with Foundry for a quick overview. With that out of the way, let’s start off by revisiting the Counter smart contract from the mentioned guide and explore what calling its functions using cast looked like.
Here’s the contract’s source:
pragma solidity ^0.8.13;
contract Counter {
uint256 public number;
function setNumber(uint256 newNumber) public {
number = newNumber;
}
function increment() public {
number++;
}
}
The first time we read its number property using cast call, the output looked like this:
$ cast call 0x5fbdb2315678afecb367f032d93f642f64180aa3 "number()"
0x0000000000000000000000000000000000000000000000000000000000000000
We also learned that the output we’re looking at is a bunch of hex bytes, and converting the value to a decimal base will yield the value we’d expect (which in this case was 0):
$ cast --to-base \
0x0000000000000000000000000000000000000000000000000000000000000000 \
10
0
Okay, why is this interesting? As mentioned in the beginning of the post, ABI encoding is used to interact with smart contracts. This means that both, input and output of calling a smart contract function, will be ABI encoded. The return value we’re looking at – that “bunch of hex bytes” – is our ABI encoded output. So let’s explore this first.
ABI encoded bytes are split into 32 byte words, so whenever we want to interpret and understand input or output data that goes into or comes out of a smart contract, it’s helpful to slice that data into 32 byte pieces.
In our case, the return value of the function call is exactly 32 bytes, so there’s nothing to split anymore.
0x # 0x prefix to indicate hex data
0000000000000000000000000000000000000000000000000000000000000000 # bytes
The only thing we need, to decode this data and turn it into a human readable value, is knowing what data type we’re dealing with. We know the smart contract’s number property is of type uint256, so number() returns a uint256 as well. That’s why we can pass it to cast —to-base and turn it into a decimal as done above.
Another way to do this more generally, is to use cast —abi-decode which takes a function selector including its return type and a return value to decode. Based on the function signature, it can then determine what the encoded data represents.
Let’s run the following command to decode a given uint256 value and just to make things a bit more interesting, let’s assume the value to decode is something other than 0:
$ cast --abi-decode "number()(uint256)" \
0x00000000000000000000000000000000000000000000008333401c60a03c1e38
2420216456551725801016
Nice, now we know how to decode a static value! If you want, you can try this with a few other values and data types. Okay, that’s it for the warm-up.
Let’s look into calldata next.
Understanding calldata
Whenever we call a smart contract’s function, what we’re actually doing is generating a bunch of ABI encoded data and send it to the EVM. That generated data is called “calldata” and it includes all the information needed so that the EVM knows what function to call and with what parameter values.
To explore this further, let’s look at what happens when we call the smart contract’s setNumber(uint256) function. Below is the command that calls that function to set the contract’s number to 10:
$ cast send 0x5fbdb2315678afecb367f032d93f642f64180aa3 \
"setNumber(uint256)" 10 \
--from 0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266
This gives us a transaction receipt that looks like this:
blockHash 0x0af7ff60c27cd0c474744a1c3318ab1522c89661c81ee313cb15eef798aebbbf
blockNumber 2
contractAddress
cumulativeGasUsed 43494
effectiveGasPrice 3875889325
gasUsed 43494
logs []
logsBloom 0x
root
status 1
transactionHash 0xd0f134143acbd84f70af9d64382e609364167ee9a5c2f6ad82bed62c70799de3
transactionIndex 0
type 2
There’s a bunch of information in this receipt, but the one thing we’re interested in now is the transactionHash. We can use it to inspect the transaction we’ve sent using cast tx:
$ cast tx 0xd0f134143acbd84f70af9d64382e609364167ee9a5c2f6ad82bed62c70799de3
blockHash 0x0af7ff60c27cd0c474744a1c3318ab1522c89661c81ee313cb15eef798aebbbf
blockNumber 2
from 0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266
gas 44948
gasPrice 3875889325
hash 0xd0f134143acbd84f70af9d64382e609364167ee9a5c2f6ad82bed62c70799de3
input 0x3fb5c1cb000000000000000000000000000000000000000000000000000000000000000a
nonce 1
r 0x5cb30735aa0281b270d6c79ef136bc0cdc27a52b0d6f6adec11f506b317b15b5
s 0x4fb3793e81c1a82984a1c28fec36768a8e646fa3d6a980a4e739ecc917356b19
to 0x5FbDB2315678afecb367f032d93F642f64180aa3
transactionIndex 0
v 0
value 0
A transaction that calls a smart contract function will have an input value. That value is our calldata and here’s what it looks like:
0x3fb5c1cb000000000000000000000000000000000000000000000000000000000000000a
Okay, so based on this, how does the EVM know which function we wanted to call, and with what arguments? It turns out that calldata is always constructed from a function selector, followed by parameter values that the function should be called with.
The first 4 bytes, in our case 3fb5c1cb, is the function selector. The 32 bytes after that are the function arguments. So in this case, we can split up the bytes like this:
0x # hex prefix
3fb5c1cb # function selector
000000000000000000000000000000000000000000000000000000000000000a # param
How do we verify that this is true? Well, to get the function selector bytes, we create the keccak256 hash of the function selector and take the first 4 bytes.
In other words:
$ cast keccak "setNumber(uint256)"
yields a hash that looks like this:
0x3fb5c1cb9d57cc981b075ac270f9215e697bc33dacd5ce87319656ebf8fc7b92
And as we can see, the first 4 bytes match the 4 bytes of our calldata! The rest of the calldata can be decoded the same way we did previously using cast —abi-decode.
Creating calldata without transactions
We don’t need to create a transaction to figure out what calldata we’ll send to the EVM. cast calldata will do it for us, given a function selector and its parameter values. The following command will produce the exact same calldata as our previous transaction:
$ cast calldata "setNumber(uint256)" 10
0x3fb5c1cb000000000000000000000000000000000000000000000000000000000000000a
This is cool because now we can play around with this and inspect encodings of different function signatures. Let’s see what happens when we have a function that expects multiple numbers:
❯ cast calldata "setNumbers(uint256, uint256, uint256)" 1 16 3203
This returns the following calldata:
0xb3b3d608000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000c83
Which we can split up into smaller pieces and read as:
0x # hex prefix
0xb3b3d608 # function selector
0000000000000000000000000000000000000000000000000000000000000001 # param
0000000000000000000000000000000000000000000000000000000000000010 # param
0000000000000000000000000000000000000000000000000000000000000c83 # param
Awesome. With that knowledge at hand, let’s look at some more interesting data types.
Dynamic data types
So far we’ve only looked at encoding uint256 values which are considered static types. Static types are pretty much any value that can’t grow in size and fit into a single 32 byte word. That’s why the three parameters in the last example are simply stacked onto each other.
Dynamic values on the other hand are values that are of non-fixed-size types. This includes bytes, strings and arrays. These types can represent values that might not fit into a single 32 byte word, so they have to be handled differently when encoded.
To explore this further, let’s assume we have a function that takes a string and generate some calldata using “hello”:
$ cast calldata "sayHello(string)" "hello"
This produces the following calldata:
0xc3a9b1c50000000000000000000000000000000000000000000000000000000000000020000000000000000000000000000000000000000000000000000000000000000568656c6c6f000000000000000000000000000000000000000000000000000000
Which, as we now know, can be split up like this:
0x
c3a9b1c5
0000000000000000000000000000000000000000000000000000000000000020
0000000000000000000000000000000000000000000000000000000000000005
68656c6c6f000000000000000000000000000000000000000000000000000000
Okay this looks certainly a bit different from what we’ve seen before. What’s happening here?
When dealing with a dynamic type, the first byte word of the value always starts with an offset. The offset tells the EVM where the actual value in the byte stream starts. In this case, the offset is 20, which is the hex value for 32 in decimal, meaning that the value we’re interested in starts after 32 bytes at the second byte word. What follows is a byte word that tells us the length of the value in bytes, which in this case is 5. Finally, the next five bytes, 68656c6c6f, are the “hello” string!
Let’s do one more. This time we’ll inspect a dynamic array. Feel free to come up with a function selector yourself, otherwise, take the following command (sorry for the bad formatting, Substack doesn’t support scrollable code snippets):
❯ cast calldata "setAddresses(address[])" \
"[
0x178412e79c25968a32e89b11f63B33F733770c2A,
0x95aB45875cFFdba1E5f451B950bC2E42c0053f39
]"
This creates the following calldata:
0x
b9571721
0000000000000000000000000000000000000000000000000000000000000020
0000000000000000000000000000000000000000000000000000000000000002
000000000000000000000000178412e79c25968a32e89b11f63b33f733770c2a
00000000000000000000000095ab45875cffdba1e5f451b950bc2e42c0053f39
At this point this is pretty straight forward. However, one thing to notice here, is that since we’re dealing with an array, the value in the byte word that describes the length is the number of elements in the array (instead of the length in bytes). In this case the length is 2 because there’s two elements, and as we can see, every element is a single byte word again, representing an address (which by the way is also a static type)
Congrats! Now we can manually ABI decode static and dynamic values!
Where to go from here…
I tried to stick to simple examples in this post so readers have an easier time following, but I highly encourage everyone to play around with this in their own projects, especially where more complex transactions come into play.
I also recommend everyone who read this post to also read DeGatchi’s post on Reversing The EVM: Raw Calldata and Ijmanini’s ABI Encoding Deep Dive, as I’ve learned a lot from their content myself (thank you guys!).
In addition, make sure to check out the official docs for Foundry and cast specifically. There are a bunch of useful utilities in there I haven’t covered in this post.
If you made it all the way here, I’d appreciate some feedback (positive and negative) and please let me know what other topics I should explore and write about.
I’m pretty active on Twitter, so feel free to ping me there!
last example visuzalized:
https://purrproof.github.io/calldata-visualizer/?signature=function%2520setAddresses%28address%255B%255D%29&calldata=0xb957172100000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000000002000000000000000000000000178412e79c25968a32e89b11f63b33f733770c2a00000000000000000000000095ab45875cffdba1e5f451b950bc2e42c0053f39