README.md

blockparser

Who wrote it ?
--------------
Author:
znort987@yahoo.com
Tip here if you find it useful:
1ZnortsoStC1zSTXbW6CUtkvqew8czMMG
I've also been cherry-picking changes I found useful from various github forks.
Credits for these:
git log | grep Author | grep -iv Znort
Canonical source code repo:
---------------------------
git clone github.com:znort987/blockparser.git
License:
--------
Code is in the public domain.
What is it ?
------------
A barebone C++ block parser that parses the entire block chain from scratch
to extract various types of information from it.
The code chews "linearly" through the block chain and calls "user-defined"
callbacks when it hits on certain "events" in the chain. Here:
"events" essentially means that the parser is starting to assemble a new
blockchain data structure (a block, a tx, an input, etc ...), or that the
parser has just completed a data structure, in which case it will usually
run the callback with the completed data structure. The blockchain data
structure level of granularity at which these "events" happen is somewhat
arbitrary. For example you won't get called every time a new byte is seen.
"user-defined" means that if you want to extract new types of information
from the chain, you have to add your own C++ piece of code to those already
in directory "cb". Your C++ code will get called by the main parser at
"events" of your choosing.
"linearly" is a bit of an abuse because the parser code often has to jump
back to previously seen parts of the blockchain to provide user callbacks
with fully complete data structures. The parser code also has to walk the
blockchain a few times to compute the longest (valid) chain. But the user
callbacks get a fairly linear view of it all.
Blockparser was designed for bitcoin but works on most altcoins that were
derived from the bitcoin code base.
What it is not:
---------------
Blockparser is *not* a verifier. It assumes a clean blockchain, as typically
constructed and verified by the bitcoin core client. blockparser does not
perform any kind of verification and will likely crash if applied to an unclean
chain.
Blockparser is not very efficient if you want to perform repetitive tasks on
thr block chain: the basic idea/premise of blockparser is that it's going to
chew through the *entire* block chain, *every* time. Given the size of the
blockchain these days, that's not something you want to do very 5 minutes.
Blockparser is not lean and mean. It used to be, when the blockchain was still
relatively small. Now that we are inching towards the 100's of gigabytes, the
very proposition that it has to chew through entire chain by design implies that
it's going to take quite a while, whichever way you slice it. Also, the entire
index is built on the fly and kept in RAM. At current sizes, this is not a very
smart choice. This might get addressed in the near future.
Why write this ?
----------------
It started as an exercise for me to get a "close to the metal" understanding of
how bitcoin works. The quality and state of the original bitcoin codebase made
this damn near impossible (it's clear to me satosh, albeit clearly a genius, was
not a professional software engineer. Also, things have vastly improved since then).
It then grew into a fun hobby project.
The parser code is minimal and very easy to follow. If someone wants to quickly
understand "for real" how the block chain is structured, it is a good place to
start
It has also slowly grown into an altcoin zoo. It is very far from being a
compendium (there's so many of the darn things these days), but adding your
fave alt is very easy.
Talking about zoo, I've also started to track and document "weird" TXO's
in the chain (comments, p2sh, multi-sigs, bugs, etc ...). Not a complete
compendium yet, but getting there.
A side goal was also to build something that can independently (as in : the
codebase is *very* different from that of bitcoin core) verify some of the
conclusions of other bitcoin implementations, such as how many coins are
attached to an address.
Another thing that blockparser is really nice for is to easily reconstruct
"snapshots" of the state of the blockchain from a specific time (e.g. the -a
option of the "allBalances" command).
How do I build it ?
-------------------
You'll need a 64-bit Unix box (because of RAM consumption, blockparser won't work
inside a 32bit address space).
If you are unfortunate enough to still have to use windows, there is a port floating
somehwere on github.
I also have heard rumors of it working on OSX.
You'll need a block chain somewhere on your hard drive. This is typically created
by a statoshi bitcoin client such as this one: https://github.com/bitcoin/bitcoin.git
Install dependencies:
sudo apt-get install libssl-dev build-essential g++ libboost-all-dev libsparsehash-dev git-core perl
Get the source:
git clone git://github.com/znort987/blockparser.git
Build it:
cd blockparser
make
It crashes
----------
At this point, blockparser uses a *lot* of memory (20+ Gig is typical). This
can cause all sorts of woes on an under-dimensioned box, chief amongst which:
- box goes into heavy swapping, and parser takes for ever to complete task
- parser eats up all RAM and all SWAP and crashes. Here's a possible remedy:
http://askubuntu.com/questions/178712/how-to-increase-swap-space
How does blockparser deal with multi-sig transactions ?
--------------------------------------------------------
AFAIK, there are two types of multi-sig transactions:
1) Pay-to-script (which is in fact more general than multisig). This one is
easy, because it pays to a hash, which can readily be converted to an
address that starts with the character '3' instead of '1'
2) Naked multi-sig transactions. These are harder, because the output of
the transactions does not neatly map to a specific bitcoin address. I
think I have found a neat work-around: I compute:
hash160(M, N, sortedListOfAddresses)
which can now be properly mapped to a bitcoin address. To mark the fact
that this addres is neither a "pay to script" (type '3') nor a
"pay to pubkey or pubkeyhash" (type '1'), I prefix them with '4'
Note : this may be worthy of an actual BIP. If someone writes one,
I'll happily adjust the code.
Note : this trick is only a blockparser thing. This means that these
new address types starting with a '4' won't be recognized by other
bitcoin implementations (such as blockchain.info)
Examples
--------
. Show all supported commands
./parser help
. Show help for a specific command
./parser allBalances --help
. Compute simple blockchain stats
./parser simple
. Extract all transactions for a very popular address 1dice6wBxymYi3t94heUAG6MpG5eceLG1
./parser transactions 06f1b66fa14429389cbffa656966993eab656f37
. Compute the closure of an address, that is the list of addresses that very probably belong to the same person:
./parser closure 06f1b66fa14429389cbffa656966993eab656f37
. Compute and print the balance for all keys ever used since the beginning of time:
./parser all >all.txt
. See how much of the BTC 10K pizza tainted all the subsequent TX in the chain
(chances are you have some dust coming from that famous TX lingering on one
of your addresses)
./parser taint >pizzaTaint.txt
. See all the block rewards and fees:
./parser rewards >rewards.txt
. See a greatly detailed dump of the famous pizza transaction
./parser show
. Track all mined blocks with unspent reward:
./parser pristine
. Show the first valid "pay to script hash (P2SH)" transaction in the chain:
./parser showtx 9c08a4d78931342b37fd5f72900fb9983087e6f46c4a097d8a1f52c74e28eaf6
. Show the first valid naked multi-sig transaction in the chain (it's a 1 Of 2 multi-sig)
./parser showtx 60a20bd93aa49ab4b28d514ec10b06e1829ce6818ec06cd3aabd013ebcdc4bb1
NOTE: the general syntax is:
./parser <command> <option> <option> ... <arg> <arg> ...
NOTE: use "parser help <command>" or "parser <command> --help" to get detailed
help for a specific command.
NOTE: <command> may have multiple aliases and can also be abbreviated. For
example, "parser tx", "parser tr", and "parser transactions" are equivalent.
NOTE: whenever specifying a list of things (e.g. a list of addresses), you can
instead enter "file:list.txt" and the list will be read from the file.
NOTE: whenever specifying a list file, you can use "file:-" and blockparser
will read the list directly from stdin.
Caveats:
--------
. You will need an x86-84 ubuntu box and a recent version of GCC(>=4.4), a recent version of
boost and openssl-dev. You may be able to compile on other platforms, but the code wasn't
really designed for those.
. As of this writing, it needs a log of RAM to work, typically upwards of 25Gigs. I will switch
to an on-disk hash table at some point, but for now you'll just need that if you ever hope to
see it finish in a reasonable amount of time (or at all if your swap space is too small).
. The code could be cleaner and better architected. It was just a quick and dirty way for me
to learn about bitcoin. There really isn't much in the way of comments either :D
. OTOH, it is fairly simple, short. If you want to understand how the blockchain data structures
work, the code in parser.cpp is a solid way to start.
Hacking the code:
-----------------
. parser.cpp contains the generic parser that reads the blockchain, parses it and calls
"user-defined" callbacks as it hits interesting bits of information. It typically calls
out when it begins reading finishes assembling a data structure.
. util.cpp contains a grab-bag of useful bitcoin related routines. Interesting examples include:
showScript
getBaseReward
solveOutputScript
decompressPublicKey
. blockparser comes with a bunch of interesting "user callbacks".
. cb/allBalances.cpp : code to all balance of all addresses.
. cb/closure.cpp : code to compute the transitive closure of an address
. cb/dumpTX.cpp : code to display a transaction in very great detail
. cb/help.cpp : code to dump detailed help for all other commands
. cb/pristine.cpp : code to show all "pristine" (i.e. unspent) blocks
. cb/rewards.cpp : code to show all block rewards (including fees)
. cb/simpleStats.cpp : code to compute simple stats.
. cb/sql.cpp : code to product an SQL dump of the blockchain
. cb/taint.cpp : code to compute the taint from a given TX to all TXs.
. cb/transactions.cpp : code to extract all transactions pertaining to an address.
. You can very easily add your own custom command. You can use the existing callbacks in
directory ./cb/ as a template to build your own:
cp cb/allBalances.cpp cb/myExtractor.cpp
Add to Makefile
Hack away
Recompile
Run
. You can also read the file callback.h (the base class from which you derive to implement your
own new commands). It has been heavily commented and should provide a good basis to pick what
to overload to achieve your goal.
. The code makes heavy use of the google dense hash maps. You can switch it to use sparse hash
maps (see Makefile, search for: DENSE, undef it). Sparse hash maps are slower but save quite a
bit of RAM.