Today I want to present my first tests to model a miner. A miner should mine blocks, adding transactions to it, building a child block from a given parent. There are many issues to solve: transactions should be valid, a transactions should be mined only once, etc… But, following TDD (Test-Driven Development) workflow, I can add functionality guided by tests. My first test:

This second test is bit weak yet. Some issues: it doesn’t enforce the validation of the transaction to process, and it is not clear what happen to that transaction after added to the block. But these pending issues should be tackled in future tests, describing the expected behavior.

The interesting face of such workflow, is that implementation is created using the simplest path, after the test descriptions.

June 6, 2016

In a previous post I described the C# implementation of an immutable trie. I need the trie structure to keep the state of accounts in the blockchain: their balance could be retrieved by account public address. The structure should be immutable, so I can retrieve the account states at different times. Each trie has the states of all accounts at a given time. Having an immutable trie, each time I added a value to a key, a new trie is created, and the old one can be still available.

These days, I added a trie implementation to my personal blockchain project written in JavaScript/NodeJS:

Each value has a key. I created a tree of tries. To add a value using a key, I added the value to the tree, using each letter in the key to traverse the trie internal tree. In a typed language, I should declare the type of keys and values, but using JavaScript, only the algorithm is important: the declaration of types is not needed to write the code.

The offset argument is used to select the character in the key. So, if the key has four characters, the value is saved in a composite trie with four levels.

This is a simple implementation, and I could improve it, having new use cases, writing new tests describing the expected behavior. Some things to add: persistent to disk/file/database, and hash by trie. Each processed block and transaction would have the hash of their resulting states, so I could retrieve it at any moment, and check if the block, after its processing, reach the same state.

When a new block is built, the initial state from parent block is known, and each transaction in the block had the hash of its final state. In this way, the block execution consistency can be checked against the output states.

In my new implementation, I’m using a block store. This object was able to save a block, and retrieve a block by hash, or retrieve the blocks with same number. But now, I added the retrieve by parent hash: that is, given a block hash, blockstore can retrieve all its know children. Given such functionality, the blockchain implementation evolved to use that feature. In this way, I don’t need to have a tree of blocks: given a block, I can retrieve all the tree of its descendants, from the blockstore.

Notice that the state of the blockchain has a naive implementation: only a JavaScript array, where the index coincides with the block number, starting with genesis block having 0 as number. My plan is to refactor this implementation, to support thousands or millions of blocks, using a block store based on disk. But now, using this naive implementation, I could explore the behavior of the node application.

If the block is know, then it implies it was already processed, then return.

If not, the block is saved in the in-memory store, and the first unknown ancestor hash is calculated. Maybe, the store has not ALL the ancestor chain up to genesis block. In this case, I cannot process the block.

If the block has a chain of ancestor that connects it to the genesis block, then I try to add the block to the blockchain:

It it is a direct child of the best block, it is added to the block (remember, a naive implementation, a simple array). If not, but it is a better block (it has a higher number), a fork to the block is performed. In any case, the children of the block are processed: maybe, the new block is the missing link to new candidate chains based on previously known blocks that were disconnected from genesis until the arrival of the new block.

All the test passed. As I mentioned, my plan is to improve the implementation. But, as usual in my TDD workflow, I prefer to give baby steps, make it works, and only then, make it right and make it fast. I had good experiences using this way of writing code, obtaining simple and solid implementations, having all the test to help me to change any internal implementation, without pain.

Encouraged by this result, I also started to refactor my blockchain implementation in my personal C# project, too:

These tests were written one by one, and each test was followed by the simplest implementation that pass the test. This is the spirit of TDD: baby steps, simple implementation, explicit description of expected behavior.

I want to execute simple code chain, using a virtual machine. The programs are usually called smart contracts. I adopted the Ethereum virtual machine design (read Ethereum Yellow Paper). Some classes:

A DataWord represents a number using 32 bytes. I implemented simple arithmetic operations using System.Numerics.BigInteger internally. I can create DataWord from integers or from byte arrays.

The Stack is an stack of DataWords, manipulated by the Machine. There is an enumeration, named Bytecodes, with the values of byte codes to be executed in the machine implementation. A program consists of a series of bytecodes, contained in a byte array. The Machine.Execute method is the place where the bytecodes are executed, manipulating the stack. Excerpt:

I will add Storage and Memory to the execution of a program. The Storage will be associated and persisted with the contract account. Each contract is an account, with address, balance, but with code and storage state. The Memory will be created and used during the execution of the program, but it won’t be persisted: it is a temporary memory, used only at execution time.

I have a simple bytecode compiler, and a line compiler, to facilitate the creation of programs.

I decided to implement the Ethereum way: having an account state, instead of inputs and outputs. The only property is Balance, but I will add more data. Notice that negative balances are rejeceted. Then, I added a TransactionProcessor:

If the transaction is processed, a new account state trie is generated, and ExecuteTransaction returns true. If the transaction is rejected, the initial accout state trie still has the original values. A typical test:

implementing a blockchain in C#, using TDD (Test-Driven Development) workflow. One element I need to implement, is the store of account states (initially their balances). The balance of an account should be retrieved by account id (a hash). But in many use cases, I need to know the balances of accounts at a given time. It is not enough to have the LATESTS balances. So, I implemented a data structure, a trie, but with immutable properties.

A trie is a tree where the non-leaf nodes stores part of the key:

In the above example, the value V1 is associated with key AAA, and the value V5 is associated with key ACA. When I change the value associated with key ABC from V4 to V7, part of a new trie is created, so the original one persists without modification:

I can access the original key/values using the “old root”, and then, I can switch to use the “new root”, at any moment. If any node/root is not reachable from a variable, garbage collector will release the memory associated with that node.

My idea is to use Trie<AccountState> as a store for account balances. At the end of a block (with transactions) processing, there is an account state trie. Ant the end of the next block processing, another trie will be generated. At any moment, I could retrieve the account balances at time “block 1”, and at time “block 2”, using the generated tries.

As usual, I followed TDD (Test-Driven Development) workflow, pursuing simplicity, doing baby steps. In the last post I mentioned the use of a DSL (Domain Specific Language) to specify some block processing tests that have complex setup.

Each command is an string, with verb and arguments. The block g0 is the genesis block. The verb “chain” enumerates a list of blocks to be created, each block is child of its previous block. The verb “send” sends the created blocks to the block processor. The verb “top” checks if the specified block is the top block in the current blockchain.

March 28, 2016

In the past days, I wrote a first implementation of a blockchain, using TDD (Test-Driven Development). Simplicity guides my decision: the blockchain is in-memory, and blocks are identified by number and a hash. Two blocks with number 42 are differente blocks if they have different hashes. A block has the parent block hash (the parent block number is deduced, it’s the block number minus one). A genesis block has number 0, and null parent hash. In this way, I have all the ingredients to build a blockchain, from genesis block to best block, chaining the blocks using their numbers and hashes.

I have a class BlockChain (yes, I prefer the mixed case name). There is a method to add a new block to the blockchain. It controls the new block has as parent the current best block in the blockchain.

But there are other nodes, that send other blocks, that can or cannot be added to the blockchain. How to process those blocks? I collect the blocks that cannot be added to the current blockchain in other objects, I call them blockbranch:

In the second blockbranch, parent block for block 41 is unknown, so, the blockbranch is waiting for the arrival of that block. The first blockbranch is connected to blockchain, as an alternative branch. But is has a height less than blockchain height.

A block branch has one or more consecutive blocks. The blocks are not part of the current blockchain. But they are proto-blockchain.

In the second blockbranch of the above figure, parent block for block 41 is unknown, so, the blockbranch is waiting for the arrival of that block. The first blockbranch is connected to blockchain, as an alternative branch. But is has a height less than blockchain height.

When I have sufficient blocks in a block branch (maybe, connect the bottom of the blockbranch to an existing block in another blockbranch or in the current blockchain), and I can build a list of blocks from genesis to the block in block branch, the branch is a candidate to be the new blockchain. Suppose a new block arrives:

The new block can be added to the second block branch, and it has an existing parent in the current blockchain. So, the block branch now has a complete path of block from genesis.

If the block branch is valid (applying their blocks to a known valid state at end of main block 39), and its height is greater than the current blockchain, the block branch is promoted to be the new blockchain:

The process works even if the new blocks arrives in random order. To manage the creation, growth, and promotion of blockchain and related blockbranches, I have a separate object, of class BlockProcessor, in charge of all this orchestration. The processor receives the new blocks, and send them to the corresponding blockchain or branch. Then, it can detect any new connection between branches, and the formation of branches that can be promoted to blockchains.

In the next post: details of a DSL (Domain Specific Language) I’m using to test different scenarios for the block processor.