Notes to self.

Bitcoin Transaction Hash in Pure PostgreSQL

In theory, Postgres should be able to verify transactions and blocks,
as well as do a lot of other things that are currently only done by
full nodes. For this to be performant, it will most likely require an
extension written in C, but I’m curious how far we can get with bare
bones Postgres.

More importantly, would that actually be useful? A node is really
just a database, a very efficient one for a very specific purpose, but
would leveraging the full power of Postgres be somehow more beneficial
than just running Bitcoin-Qt or btcd, for example?

To get to the bottom of this would be a lot of work, and potentially a
lot of fun. It would also be a great blockchain learning exercise. (If
you’re working on a PG extension for Bitcoin or more generally
blockchain, please do let me know!)

Random Thoughts

The structure of the Bitcoin blockchain is relatively simple. We have
transactions, which in turn have inputs and outputs and belong
to blocks. Four tables, that’s it.

I’ve been able to import the whole blockchain with some fairly basic
Go code into my old Thinkpad running Linux overnight. The Go code
needs some more polishing and is probably worthy of a separate write
up, so I won’t get into it for now. Below is the schema I used. I
intentionally left out referential integrity and indexes to keep it
simple and avoid premature optimization.

There are a couple projects out there that keep the blockchain in a
database, most notably
Abe. I haven’t studied
the code very carefully, but my initial impression was that Abe tries
to use standard SQL that would work across most big databases, which
is philosophically different from my objective of going 100% Postgres
and leveraging all that it can do for us.

Bitcoin uses a lot of uint32’s. A Postgres INT is the correct size,
but it is signed, which means we have to use the next larger type,
BIGINT. It seems like it might be a waste to use 64 bits for a 32-bit
value, but I couldn’t think of anything better than a BIGINT. For the
binary stuff it seems like BYTEA is the best match.

So what can we do with this? There is no easy way to create or verify an
Elliptic Curve signature
in Postgres, but with the help of the pgcrypto
extension, we should be able to at least generate the correct SHA256
digest which is used in the signature. As a side note, EC signature math is actually
remarkably simple and could probably be implemented
as a PG function, but I’m too lazy. Here it is in a
few lines of Python.

For the purposes of this exercise, I’d just be happy with a value that
matches, even if the code does not fully comply with the Bitcoin rules.

One problem I ran into was that, because Bitcoin blockchain is
little-endian except for where it isn’t, you often need a way to
reverse bytes in a BYTEA. Strangely, Postgres does not provide a way
to do that, unless I’m missing something. But thanks to
stackoverflow,
here is one way to do this:

The particular transaction we are looking at is
this.
It happens to have id of 37898 in my database. In case you’re
wondering, for this example I used a subset of the blockchain which
only has the first 182,000 blocks. On the full blockchain and without
indexes, this statement would have taken an eternity to execute.

What makes this particular transaction interesting is that it has two
inputs, which is slightly trickier, because to spend them, there need to
be two different signatures of the same transaction. This is because
before signing, the input scriptSig needs to be replaced with the
output’s scriptPubKey (the oversimplified version). This is reflected in the SQL
in the use of LATERAL and CASE.

You do not have to take my word that the two hashes are correct, we
can verify them fairly easily with a bit of help from the Python ecdsa
library. Here is the code to verify the second hash. The key and the
signature are in the
transaction itself.