Dev Notes

Software Development Resources by David Egan.

Compute Bitcoin Merkle Root


Bitcoin, C++, Cryptocurrency
David Egan

Merkle trees are an efficient way to verify that an element is in a set, without having to store the full set.

The leaf nodes (lowest level nodes) of the Merkle tree are made up of hashes of individual data members of the set. In the case of Bitcoin, individual data members are made up from transaction ids (TXIDs). TXIDs are the result of SHA256 hashing the bytes in a raw transaction twice.

Adjacent leaves are concatenated pairwise, and the hash of the concatenation constitutes the node’s parent.

Parent nodes are concatenated and hashed in a similar way to generate another level of parent nodes. This process is repeated until a single hash remains - the Merkle root.

Each non-leaf node of a merkle tree is a hash of the concatenation of it’s immediate children. The leaves of the tree are the elements of the set to which the Merkle tree proves membership.

Merkle Trees as Implemented By Bitcoin

In the case of Bitcoin, the leaves of the Merkle tree are the transaction identifiers (the hash of a raw transaction).

If there are an odd number of nodes at any level, the final node is concatenated with itself. Parent nodes are in turn concatenated in pairs and hashed, with the process repeated until a single Merkle hash remains - the Merkle root.

The Merkle root is stored in a block header, where it serves to make transactions tamper-proof - if a transaction is changed, the Merkle root would be thrown off. Because the hash of each block is included in subsequent blocks, the tamper would be immediately evident and the block with the tampered transaction would not be accepted as valid by the Bitcoin consensus rules.

To verify a transaction - check that a transaction is in a valid block - you just need the hashes of Merkle branch to compute the Merkle root, not the entire set of transactions in the block.

Flaw in Bitcoin Implementation of Merkle Trees

As can be seen in this comment in the Bitcoin Core codebase, and described comprehensively in BIP 98, there is a flaw in the Bicoin implementation of Bitcoin’s Merkle tree algorithm relating to duplicate txids.

The flaw is caused by the duplication of the last node in the case of levels that have an odd number of nodes - a practice which is non-standard for Merkle trees.

C++ Implementation

Compute a Merkle root from a set of transaction IDs (txid) expressed as hexadecimal strings. The code shown is from the file bitcoin.cpp in the accompanying repo:


/**
 * Compute the Merkle root 
 * */
void merkleRoot(std::vector<Bytes> txids, Bytes& result)
{
  if (txids.empty()) {
    throw;
  }
  // Loop until the txid container has been reduced to a single element - the Merkle root
  while (txids.size() > 1) {
    // If odd number, add the last element to end of vector.
    // Note that this is required at every level of the tree.
    if (txids.size() & 1) {
      txids.push_back(txids.back());
    }
    std::vector<Bytes> tmp;
    for (auto it = std::begin(txids); it != std::end(txids) && std::next(it) != txids.end(); it += 2) {
      Bytes concat = *it;
      Bytes result(hash_size);
      
      // Concatenate this element and the following adjacent element
      concat.insert(concat.end(), (*(it + 1)).begin(), (*(it + 1)).end());
      
      // Hash the concatenated elements, save into `result` container
      doubleSHA256(concat.data(), concat.size(), result);

      // `tmp` is a container holding the results of concatenating & hashing
      tmp.push_back(result);
      concat.clear();
    }
    // Set txids to reflect the reduction in elements from this round of concatenation/hashing
    txids = tmp;
  }

  // At this point, `txids` should be a container with a single element.
  result = txids[0];
}

In this example, the merkleRoot function receives two parameters:

  • std::vector<Bytes> txids - where Bytes is an alias for std::vector<uint8_t>, a collection of byte objects.
  • Bytes& result - A std::vector of <uint8_t> objects, passed by reference to receive the result.

Implementation

You try out the code above by cloning the accompanying GitHub repo.

Get TXIDs for a Block Using bitcoin-cli

You can use bitcoin-cli to get the transactions from a sample block and save these as a manifest file:

# Write transaction ids for block
# 00000000000000000002f5b1c49b9ddf5537d418b6c5b835172b3987a09a4b13
# to /tmp/manifest
bitcoind --daemon # bitcoind must be running

# Fetch block and process results with `jq` to obtain a file comprised of `txid` hashes for all
# transactions in the block, each on a separate line. The file `/tmp/txid.manifest` can then be
# used as input to the programme via input redirection:
block=00000000000000000002f5b1c49b9ddf5537d418b6c5b835172b3987a09a4b13
bitcoin-cli getblock ${block} | jq -r '.tx[]' > /tmp/txid.manifest
bitcoin-cli stop

This is useful for testing Merkle tree root computation - you can check the result of your implementation with the Merkle root computed by Bitcoin Core:

# set ${block} as above
bitcoin-cli getblock ${block} | jq -r '.merkleroot'

If you don’t have jq installed, you can achieve roughly the same result with standard UNIX tools like this:

bitcoin-cli getblock ${block} | grep merkleroot | awk '{print substr($2,2, length($2) - 3)}'

References


comments powered by Disqus