Off-chain storage pattern in Solidity: Merkle Tree
In Solidity, sometimes you need to add a big amount of data to the smart contract. Adding a big amount of data costs a lot of gas and hence sometimes is not possible to do. For example, you need to whitelist thousands of users or add thousands of addresses into a vesting contract. In this article, I will tell you how to resolve such tasks with the use of Merkle Trees.
Merkle Tree Algorithm
Merkle trees allow us to compress a list of data of any length into one single hash. The compressing algorithm works by creating pairs of elements, summing the pairs, and hashing the sums of the pairs. This process repeats, until, we receive a single root hash.
This allows us to compress a list of any length into one single 256 bits hash.
But, of course, we can’t extract the value back from the hash. But, fortunately, we don’t need to do it.
Ok, we’ve compressed the data into one hash. Now we can pass this hash onto a smart contract. This would cost us a low amount of gas because passing a 256 bits hash into a smart contract costs less than passing thousands of them. Now we are passing one.
Now we have the list (in a compressed way) on our contract. The values in the list could be of any type, whether it’s an address, unit, or a more complicated type. We have two possible operations to do on that list and their corresponding algorithms:
Modify (Add/Remove element, change details of an element):
1) Change the data off-chain
2) Re-calculate the hash
3) Pass the new hash onto the contract.
Read (Check if the element exists, gather the element’s details):
1) Accept as parameters: (data of the element, cryptographic proof that this element is in the list)
2) Prove that this element exists in the root hash i.e. in the list
3) Read details about the passed element as a parameter.
4) Do whatever is required next. (e.g. if it’s a vesting contract - send tokens to the address)
How cryptographic proof for Read works
How do we prove that an element is in the Merkle tree?
To prove that an element (hK on the picture) is in the hash, we have to assemble the hash again, and it’s mandatory to use the hK hash during the assembling because we want to prove that hK is in the list. If we assemble the top hash with explicit usage of the element’s hash (hK), it means that the element’s hash is contained in the top hash. So, a smart contract function, that wants to check if an element is in the root, should re-create the root hash itself, using the element’s hash and other hashes nearby it, and check, if the result is the same as the root hash. If the result is the same, it means, that the root hash contains the hash of the element.
You might think, what is the point of this, if we assemble the hash again, we will need to do the same number of steps, as the number of elements, or even more? “We could’ve added those elements to the mapping”. But actually, we don’t need all the elements to assemble the root hash. As you can see in the picture above, we only need to have those nodes (hashes), that allow us to calculate the next level of the tree.
So, we start off with the element that we should verify — hK. What is the other element we need to get higher in the tree? It will be the element’s pair (hL). Now, we’ve gotten one level up — hKL. What is the next hash we need? It is hIJ. But, as you can see, we did not need hI and hJ, we only needed hIJ. So, we’ve saved 2 steps — one-half of the steps so far. Alright, we’ve got hKL and hIJ, and we’ve gotten to the next level — hIJKL. What’s the hash we need to get higher? It is hMNOP, and we can take its hash, we don’t need the extra processing of the 6 hashes below it. Again, we’ve saved another half of the steps. Good, let’s go next, we’ve processed hMNOP + hIJKL, and now we are on hIJKLMNOP. What does it cost us to get onto the next level? one step — combining hIJKLMNOP with hABCDEFGH. Congratulations, we’ve got to the hABCDEFGHIJKLMNOP now, and we can check, if the resulting hash is the same, as it is recorded in the smart contract. If it is, it means that the hK hash is contained in the top hABCDEFGHIJKLMNOP hash.
The computational complexity of the proof
If we only need one step to get one level up the tree - it’s not so hard to calculate the complexity. The complexity of proving that an element is in the tree is the height of the tree. In this case, the height is 4, and the number of elements is 16. This means, that the complexity of verifying that an element is in the tree is O(logN).
This means, that on the level of the smart contract, the complexities are as follows:
- Modify [whole list] - O(1) (we need to replace the root hash value in a variable);
- Read - O(logN) (we need to verify that an element is in the tree, and then we can continue reading its details and work with it).
Examples
- A vesting contract, where you need to populate 10,000 addresses that can claim tokens. You can’t submit 10,000 addresses nor to the array, or to a mapping. Instead, you could store the addresses off-chain, and only submit the top root hash to the smart contract. When a person wants to claim, they need to submit Merkle proof (cryptographic proof we’ve discussed above) and the value of their node in the tree, which is {address, amount of tokens} object. Once you’ve re-created the top-level hash with this object and other hashes that the Merkle proof contains, you can see if it matches the one recorded in the smart contract. If it does, it means you can be sure that such an object {address, amount of tokens} exists in the tree. Then, you can send them the tokens they want to claim and register in a mapping that this address has already claimed their tokens.
Note: in a traditional on-chain scenario, adding 10,000 addresses into a mapping would cost approximately 50,000 $. By using Merkle root hash instead of mapping, we would save 50,000$. - NFT / ERC20 token, where only whitelisted users can use the token.
In this scenario, we could store in the hash, that stores the addresses that can use the token, and every time somebody wants to make a transfer or call some function that marks them as whitelisted, they need to provide Merkle proof that proves that they are in the whitelist. - Any contract with a whitelist, not just NFT/ERC20, for instance, also a launchpad. There are also many other applications, I will be updating the article every time I find a new scenario where it could be useful.
Conclusions
In this article, I’ve explained to you one way to store big amounts of data in a single variable. Adding thousands of addresses even within a single transaction to a mapping in Solidity costs a lot of gas, which could even exceed block gas limits. On the Ethereum chain, such an operation, if possible at all, would cost tens of thousands of dollars. So, this pattern allows developers to work with huge data amounts in Solidity, and find easy solutions for such problems as whitelisting users for NFTs, or submitting addresses into a vesting contract.
This is just one pattern of fixing such an issue, and there are more. I will explain them in upcoming articles, and I will also create articles that show our experience creating contracts with such solutions, that passed an audit and went to production, with a high number of users and high TVL.
If you have any questions about this or want a project to be developed, you could contact me:
Telegram — @RedDuckUA
Email — mark.virchenko@redduck.io