Last week I spent some time on collecting certain statistics (e.g., average
number of performed transactions and created blocks per month) over a vast
(~20GB) Bitcoin blockchain dataset. The hardest part for me was to pick the
right tool to parse the raw blockchain data. First, I hit the Google with
“parse bitcoin blockchain” keywords. Unfortunately, the returned results
(bitcointools,
blockchain,
blockparser, etc.) point to almost
undocumented projects, where some appear to not even work. (As a side note,
bitcointools require a running BitcoinQt/bitcoind process in the background,
which I find pretty amusing.) Next, I checked some papers on Google Scholars to
find out how other people solved the problem. A paper leaded me to
BitcoinArmory project, which
requires a dozen of manual interventions to get installed. (I did not even
attempt to install it.) Suddenly, it occured me to add a “java” keyword to the
search phrase, which led me to bitcoinj project.
bitcoinj is far most the best Bitcoin blockchain parser library that I have
ever met. It has a rich documentation, developer-friendly (and fully
documented) API and works out of the box. It is composed of a single JAR, no
other requirements, stupid hassles, etc. In addition, its IRC channel at
FreeNode is packed with real people that provide instant support on any Bitcoin
related questions.

Enough with the talk! Let’s get our hands dirty with the code. I first included
the bitcoinj Maven dependency in my pom.xml as follows:

Here comes the simplest part: the Java code. Below, I calculate the average
number of transactions per block per month.

importcom.google.bitcoin.core.Block;importcom.google.bitcoin.core.NetworkParameters;importcom.google.bitcoin.core.PrunedException;importcom.google.bitcoin.core.Transaction;importcom.google.bitcoin.params.MainNetParams;importcom.google.bitcoin.store.BlockStoreException;importjava.io.File;importjava.util.ArrayList;importjava.util.HashMap;importjava.util.List;importjava.util.Map;// Arm the blockchain file loader.NetworkParametersnp=newMainNetParams();List<File>blockChainFiles=newArrayList<>();blockChainFiles.add(newFile("/tmp/bootstrap.dat"));BlockFileLoaderbfl=newBlockFileLoader(np,blockChainFiles);// Data structures to keep the statistics.Map<String,Integer>monthlyTxCount=newHashMap<>();Map<String,Integer>monthlyBlockCount=newHashMap<>();// Iterate over the blocks in the dataset.for(Blockblock:bfl){// Extract the month keyword.Stringmonth=newSimpleDateFormat("yyyy-MM").format(block.getTime());// Make sure there exists an entry for the extracted month.if(!monthlyBlockCount.containsKey(month)){monthlyBlockCount.put(month,0);monthlyTxCount.put(month,0);}// Update the statistics.monthlyBlockCount.put(month,1+monthlyBlockCount.get(month));monthlyTxCount.put(month,block.getTransactions().size()+monthlyTxCount.get(month));}// Compute the average number of transactions per block per month.Map<String,Float>monthlyAvgTxCountPerBlock=newHashMap<>();for(Stringmonth:monthlyBlockCount.keySet())monthlyAvgTxCountPerBlock.put(month,(float)monthlyTxCount.get(month)/monthlyBlockCount.get(month));

That’s it! In order to appreciate the hassle-free simplicity of the bitcoinj
interface, you ought to take your time and spend a couple of hours on other
tools first. (About the performance, for the sample blockchain dataset of size
4.7 GB, above code snippet completes in less than 2 minutes on my 2.4 GHz
GNU/Linux notebook without any JVM flags. Pretty zippy!)