Blockchain: what is in a block?

Introduction

In my previous article, I tried to described the concept of a blockchain with code. This time, I'll try to describe the structure of a single block. I will use the Bitcoin blockchain to explain blocks, but keep in mind that the concepts will remain more or less the same. It could be useful to read my last article to understand a few things first.

Structure of a block

A block is a container data structure. In the Bitcoin world, a block contains more than 500 transactions on average. The average size of a block seems to be 1MB (source). In Bitcoin Cash ( a hard fork from the Bitcoin blockchain ), the size of a block can go up to 8MB. This enables more transactions to be processed per second.

Anyway, a block is composed of a header and a long list of transactions. Let's start with the header.

Block Header

The header contains metadata about a block. There are three different sets of metadata:

The previous block hash. Remember that in a blockchain, every block is inherits from the previous block because we use the previous block's hash to create the new block's hash. For every block N, we feed it the hash of the block N-1.

Mining competition. For a block to be part of the blockchain, it needs to be given a valid hash. This contains the timestamp, the nonce and the difficulty. Mining is another crucial part of the blockchain technology, but it is outside the scope of this article.

The third part is a merkle tree root. This is a data structure to summarize the transactions in the block. And we will leave it at that for now. More on this later.

Block identifiers

To identify a block, you have a cryptographic hash, a digital signature if you will. This is created by hashing the block header twice with the SHA256 algorithm. For example, this is a block. I will refer to this block as an example for this article.

The block header hash for this particular block is (right column): 000000000000000000301fcfeb141088a93b77dc0d52571a1185b425256ae2fb

We also can see the previous block's hash (right column): 0000000000000000004b1ef0105dc1275b3adfd067aed63a43324929bed64fd7

Remember that we used the second hash to create the first. Every block uses the previous block's hash to construct its own hash. The block hash is a unique identifier. You won't find two blocks with the same hash.

The other way to identify a specific block is the block height. The is the position of the block in the blockchain. Our example's block is in the 500312 position. This means that there are 500311 blocks before this one. Since the creation of the Bitcoin blockchain in 2009, 500312 blocks have been created ( at the time of writing obviously ).

A block height is not unique. Several blocks can compete for the same position in the case of a fork, like Bitcoin Cash for example.

Merkle Trees

The transactions in a block are contained in a structure called a merkle tree or binary hash tree.

I feel that topics like that are easier to understand with actual examples. So we'll go coding for this. A merkle tree is constructed by recursively hashing pairs of nodes ( in this case, transactions ), until there is only one hash, called the root or merkle root. If we stay in the Bitcoin world, the cryptographic hash algorithm used is SHA256. This is applied twice each time.

An example: We have a block with 4 transactions. For the sake of simplicity, each transaction is a string:

consttA='Hello'consttB='How are you?'consttC='This is Thursday'consttD='Happy new Year'

To construct our merkle tree, we start from the bottom. We take each transaction and double-hash them. I'll use the js-sha256 package here.

Ok, great. Now remember that I wrote a merkle tree is constructed hashing pairs of nodes. So, we will pair our transactions and concatenate their hashes. Then, we will double hash them too. We will create a hash using the hashes hA and hB, and another for hC and hD. Then, we repeat that process until we have only one hash left and no more pairs to work with. The last hash will be our merkle root.

With only four transactions, this will be rather quick:

//Pairing hA and hBconsthAB=sha256(sha256(hA+hB))//5dc23d1a2151665e2ac258340aa9a11ed227a4cc235e142a3e1738333575590b//Pairing hC and hDconsthCD=sha256(sha256(hC+hD))//ff220daefda29821435691a9aa07dd2c47ca1d2574b8b77344aa783142bae330// We do it again. We pair hAB and hCD// This is our root!consthABCD=sha256(sha256(hAB+hCD))//301faf21178234b04e1746ee43e126e7b2ecd2188e3fe6986356cc1dd7aa1377

The node at the top of the merkle tree is called the root. This is the information that is stored in the block header in each block on the blockchain. This is how transactions are summarized in each block. In our example block given earlier, the merkle root can be found in the right column: a89769d0487a29c73057e14d89afafa0c01e02782cba6c89b7018e5129d475cc

It doesn't matter how many transactions are contained in a block, they always will be summarized by a 32 bytes hash.

Note: The merkle tree is a binary tree. If there is an odd number of transactions, the last one will be duplicated so we can construct our tree.

Because all the leaves in our tree depends on other leaves, it is impossible to alter one leaf without altering others. If you change only one leaf ( one transaction ), the hash changes, therefore the hash you constructed by pairing it with another leaf changes, therefore the merkle root will be different.

You can prove that any transaction is included in a block by creating a authentification path or merkle path. You only need to know log base 2(N) 32-byte hashes. For example:

-For my 4 transactions merkle tree:

log base 2( 4 ) = 2 => If I have a path of 2 hashes for a tree of 4 transactions, I can manage to prove if a transaction belongs to this merkle tree.

For a 16 transactions merkle tree:

log base 2( 16 ) = 4 => If I have a path of 4 hashes for a tree of 16 transactions, I can manage to prove if a transaction belongs to this merkle tree.

log base 2( 1500 ) = 10.55 => If I have a path of 11 hashes for a tree of 1500 transactions, I can manage to prove if a transaction belongs to this merkle tree.

Perhaps a little diagram will help.

There are 16 leaves in this tree. We construct our tree from the bottom up by pairing each leaf. Now, anybody can prove that the leaf I ( in orange ) is part of this block by having the path given in green. We have only 4 hashes, but that is enough to know if the leaf I belongs here. That is because with those informations, we are able to construct every single leaf we need( in yellow ). We can create IJ, IJKL, IJKLMNOP and the root and check if those hashes correspond. This is why it is very complicated to cheat a blockchain. To change one thing means you must change everything.

Well, that's pretty much what a block contains in the Bitcoin blockchain. Hope it helped!

I have a rookie question: how are the transactions included in a block determined? I expect that at any given time, all the miners are working with a similar but non-identical set of transactions to create the next block. Do they have to check the transactions included in every new block while they are computing the hashes, and discard all the work if a newly added block includes any transaction they were working with?

Yes, whenever a new block is mined, miners have to create a candidate block. This candidate block includes all the transactions that have not been mined yet. If a miner was trying to mine a block and failed, she will check which transactions have been included in the winning block. Whatever transactions is leftover becomes part of this new candidate block.

If you want a deeper explaination on this, I wrote an article called Blockchain: what is mining? That explains this concept :)

but let's say i'm a full-node and i modify an existing block/transaction. specifically, i modify a transaction's unspent output. then i make another new transaction on top of this where i use the previously modified output as an input. let's assume i am the owner of this unspent output that i will be using as input for the new transaction. yes, the hash info for the transaction and thus the block will no longer be correct. when a peer node goes to verify the transaction they notice the transaction has been signed by a valid private key and thus proving i am the owner of the modified unspent output. how long before I actually get caught? am i missing some part of the verification process?

In this block, you modify one transaction to your benefit. In this transaction, you give yourself 10 BTC instead of 0.01 BTC for example.

For this new transaction to be part of the blockchain, you, or another miner, would have to find a new Proof-of-Work for this block. This is an entirely new block now.

You mine it and find a valid Proof-of-Work. You propagate your finding to the network and they have to validate this new block.

Here, the problem is much bigger for this block to be accepted.

1) The network sees you tried to cheat the system. The block is rejected and you wasted your resources mining this new block for nothing.

2) The network accepts this new block ( for whatever reasons ). Now, we have to mine every single block that was validated after the one you just proposed because they are all invalid now.

I believe in this case, you won't get caught, because your block will never be part of the blockchain. You can't modify a block that is already part of the blockchain. You will create a new block that will act as a replacement of an existing block. So, for this block to be accepted, the network would also have to provide valid proof-of-work for every single block after the one you want to change.

In the case where you are mining the block and decide to modify a transaction in the block you are mining in your favor:

When you find a valid Proof-of-Work to your block, it is propagated to other nodes. These nodes will verify that the Proof-of-Work is valid, but they will also verify every single transaction in the block. There is a very long list of parameters that nodes must control in order to call a transaction valid ( inputs, outputs...). In this scenario, this is where your fraud will be stopped. Your block won't be accepted by the network.

Very good. I like your article series - I wish I would have read it when I got started, you make it really easy to understand (or at least: as easy as possible).

One small thing I noticed: you wrote "Mining is another crucial part of the blockchain technology, but it is outside the scope of this article".

I have heard many people speak about "the blockchain", implying there is only one, when in fact there are many. The most popular one is the Bitcoin blockchain, and for that chain, your statement is correct. It uses "Proof-of-work" a.k.a. mining for creating new blocks/coins.

There are other blockchains that use "Proof-of-stake", so instead of mining, new blocks are created based on stake, i.e. already owned coins, and random distribution.