libbitcoin is a community of developers building the open source library, tools and implementation necesary for a free, independent and vibrant Bitcoin. In this way we are helping to build a better future.

libbitcoin believes in the revoltionary promise of Satoshi's original protocol.

The libbitcoin development project aims to create an extendable, scalable and configurable architecture, along with useful software. Making Bitcoin super-pluggable, highly configurable and easy to interact with.

Our Values:

Privacy: Bitcoin should always remain as private as possible for its users.

Scalability: Bitcoin built today with the future in mind.

Integrity: No individual or group should have enough power over the network to compromise its original aims.

By having a bitcoin library, it enables everybody to build apps really quickly using a common stable codebase. This project aims to create that architecture to be extendable, scalable and configurable in the future.

rewrite bitcoin, make it super-pluggable, very easy to do and hack everything at every level, and very configurable

The Zen of libbitcoinReadability over speed.Beauty over convenience.Simplicity over complexity.Architected, not hacked.Flat, not nested.Explicit, not implicit.Errors should be loud.Never is better than right now.Now is better than never.Be flexible and configurable.Build houses from bricks, software from modules.

libbitcoin is licensed as Lesser AGPL:- When used in the p2p network, you only need to provide changes on demand (LGPL).- When used on a webserver, you must proactively provide sourcecode for any changes you have made to libbitcoin.- Applications can link against libbitcoin, and they do not need to release their changes publically.Thanks to Aaron Williamson of the SFLC and Richard Stallman of the FSF for helping draft up this license.

This is the inner core which consists of a bunch of subsystems co-operating over clearly defined interfaces.

As an example one such message could command the block_chain_manager to create a transaction. It then sends the new transaction to the network component to actualise it.

The inner core is very difficult to use so we abstract it's usage to a more usable outer core. Think of the inner core like X-windows and outer core an easy to use GUI toolkit like Qt (by analogy).

Into the core we can switch in and out different components. In this example we choose to replace a MySQL serialiser/de-serialiser with an SQLite one. Maybe there could also be in-memory serialisers (never store to disk) or whatever.

A serialiser takes in a data object and then stores it. The type of component decides where + how it looks when stored. When reading the object it de-serialises in reverse.

The whole point is not to just dump binary blobs into MySQL rows but to put them there in a readable format so an admin can dissect them in case of problems. We also will only create a MySQL serialiser for the time being unless it's possible to use standard SQL and simultaneously support MySQL, PostgresSQL, SQLite.

Above the diagram for the outer core indicated an exposed C api with auto-generated Swig bindings.

We try to keep the core very simple and focused; no mining, no accounts, no GUI, no RPC server. These things can all be implemented in Python. Already genjix has Spesmilo as a Qt GUI using bitcoind that we can easily use since it has a special interface that can be re-implemented for whatever Bitcoin implementation. Accounts can be provided as a Python module or special plugin in C or C++.

As another example, by being able to switch the networking component, we can replace it with a dummy networking component that is used for debugging.

Or use one networking component for multiple instances of Bitcoin or other currencies.

We try to split everything up as much as possible to help developers long term. It makes it easier for new contributors to get started. That's the free-software way: lots of components and clear separation to allow maintainers to work on their piece of turf.

Callbacks

A good plugin+component system is a must for this project. The core will probably have to be thread-safe if we use threads for networking. Otherwise an event system could be a bottleneck since we get the worst of both worlds.

If we make a block-chain verifier subsystem, is there an object to making it one-shot (verify once or every X blocks the block chain in one go) or does it need to be after every block is downloaded?

In any case, it would be nicer to be able to hook the verifier system to a generic callback as opposed to calling it manually in the code.

boost::signals2 seems like a good choice to be considered, although it has to be seen whether there is any bottleneck. Also any callback system would have to be exposed in Python somehow

Random notes:

* Our main target is large scale deployments. Bitcoin Enterprise (utilising our libbitcoin). The Red Hat of Bitcoin, not the Ubuntu. * Use boost::property_map? * boost::asio is a good choice for networking. * Externally we expose a C api for people that cannot program C++ but use C++ internally. A C ABI is also nice since it doesn't get mangled by the compiler. * Status during initialisation for debugging/showing progress in GUI client.

* Blocks are mostly sequential. * Blocks contain a list of transactions. * Transactions contain a list of inputs + outputs. * Blocks can be free floating (orphan blocks). * Transactions can be free floating (not yet in a block). * There is one sequence of events that we accept as the truth at any one time.

From big to small:

* Blocks * Transactions * Inputs * Outputs * Scripts * Operations

Block-chain

The block-chain is a series of blocks chained on in a linear sequence. The block-chain is an append only database of all transactions in the network.

What's accepted as true, is the chain with the highest difficulty. Difficulty is computed using a function from the 'bits' field of the block. Given an entire chain, we can calculate the difficulty using:

Code:

SELECT SUM(COMPUTE_DIFFICULTY(bits)) FROM our_chain;

Occasionally this chain of events may be conflicting when the chain splits in half because there are two next rival blocks trying to append to the current chain. Always, the chain with the highest cumulative difficulty is preferred over the others.

It can happen that more blocks build on a chain that previously had lower difficulty and so take over the main chain. This is termed a block-chain re-organisation. But as more blocks build off the main chain, that becomes exponentially less likely until eventually it becomes a near impossibility. Branches off the main block-chain can be pruned once they reach a certain depth that it's near unlikely they will ever become the main branch again.

In the diagram above blocks are represented by black dots. The block_id is a unique identifier for each new block inserted into the database. Each of the dots above would have a unique block_id.

Depth is the vertical number on the left, and is synonymous with a block's block-number.

Span left and right indicates the range of chains that a block exists in. A block with a span_left of 0, and a span_right of 3, would exist in chains 0, 1, 2 and 3. Using a span, we can easily select a particular chain:

Code:

SELECT * FROM blocks WHERE span_left=0 AND span_right=0;

Or select the common point between a series of chains:

Code:

SELECT MAX(depth) FROM blocks WHERE span_left>=0 AND span_right<=2;

Once span_right starts to exceed a value (say 10) and the depth since the chain forked exceeds a certain value, then that chain can be pruned:

Code:

DELETE FROM blocks WHERE span_left=10 AND span_right=10;

The total span range of all blocks should equal all the number of stored chains. If we have a span of [0, 10] then there should be 11 spans from 0 to 10. A table exists to keep track of all the chains.

Code:

blockchains: chain_id max_depth last_block_id total_difficulty

Transaction

Nothing unusual happens here, except that transactions can be included more than once in multiple chains. A seperate table exists to track the parents for a transaction.

SELECT * FROM operations WHERE script_id=45 ORDER BY operation_id ASC;

Broadcasted inventories

This will be stored in memory for now as there are potential attack vectors to serialising inv items. An attacker could cause many disk read/writes by spamming a host potentially. More thought needs to go into this later.

Main block-chain

A convenience table is provided for the end user to read the block-chain transaction history. It contains the currently accepted version of the truth. It provides a way for the user to quickly query transactions and see funds in addresses.

The inputs and outputs are joined in a view. All the outputs which are final and have not been spent, would have no equivalently joined on input.

SELECT * FROM operations WHERE script_id=45 ORDER BY operation_id ASC;

I wonder here, what is the advantage to decomposing the scripts into separate database rows, instead of storing it as a binary blob?

I mean, the most common operation is "fetch the entire script and execute the opcodes". By decomposing the data structure this deep this becomes a more expensive query.

Bitcoin Core developer [PGP]Warning: For most, coin loss is a larger risk than coin theft. A disk can die any time. Regularly back up your wallet through File → Backup Wallet to an external storage or the (encrypted!) cloud. Use a separate offline wallet for storing larger amounts.

I wonder here, what is the advantage to decomposing the scripts into separate database rows, instead of storing it as a binary blob?

I mean, the most common operation is "fetch the entire script and execute the opcodes". By decomposing the data structure this deep this becomes a more expensive query.

For a node, maybe. What about for data processing? Say you just want to quickly find all scripts with the same destination address. The performance difference will be negligible for the amount of flexibility gained. Returning data like this is what RDBMS' are optimized for.

The performance difference will be negligible for the amount of flexibility gained. Returning data like this is what RDBMS' are optimized for.

Maybe; I wouldn't be so quick to call it 'negligible', as I do know that for the current client, database accesses are by far the biggest source of performance issues.

Anyway, don't do too much premature optimisation, it might not be useful yet to discuss in this phase.

Bitcoin Core developer [PGP]Warning: For most, coin loss is a larger risk than coin theft. A disk can die any time. Regularly back up your wallet through File → Backup Wallet to an external storage or the (encrypted!) cloud. Use a separate offline wallet for storing larger amounts.

Great work, I just have a question about the span. I don't understand how it works, because if you fork the block chain, won't all the existing span numbers have to be incremented?

I've been trying to implement a sqlite backend (so far just dump to sqlite which is mostly working), and it seems to be helpful to distinguish between dynamic data and data which is immutable such as the successor relationship between blocks, transaction containment within a block, etc.

Whether a block is in the best chain and whether a txout has been spent in the best chain is dynamic, so I have to implement queries that more or less correspond to ConnectBlock and DisconnectBlock. I would love to be able to eliminate it using your span mechanism or something similar but I'm not understanding how it works.

Great work, I just have a question about the span. I don't understand how it works, because if you fork the block chain, won't all the existing span numbers have to be incremented?

I've been trying to implement a sqlite backend (so far just dump to sqlite which is mostly working), and it seems to be helpful to distinguish between dynamic data and data which is immutable such as the successor relationship between blocks, transaction containment within a block, etc.

Whether a block is in the best chain and whether a txout has been spent in the best chain is dynamic, so I have to implement queries that more or less correspond to ConnectBlock and DisconnectBlock. I would love to be able to eliminate it using your span mechanism or something similar but I'm not understanding how it works.

I'm still working on ProcessBlock (verifying the blocks) and everything below it (ConnectBlock/DisconnectBlock). You can see an example of how the span works in src/storage/postgresql_storage.cpp:reorganize_block_chain()... That function will be moved internally and within verify_block under include/bitcoin/verify.hpp

Indeed you're correct that all the spans of the parents would have to be incremented if you fork the block chain. This is OK since it can be done in a simple single UPDATE, inserting a new block is an infrequent operation (doesn't have to be as fast as SELECTs) and it's not a big deal if adding a new block isn't immediate.

Far more important is the select speed. If you run the poller and postgres benchmark:

Code:

$ ./bin/examples/poller dbname dbuser dbpass localhost:8333... wait to download all the blocks$ make psql$ time ./bin/tests/psql... run benchmark to fetch 100 blocks

With my current setup it takes 10 secs to fetch 100 blocks or 0.1 secs for 1 block. That means 4 hours to fetch the entire block-chain of 140k blocks. Using a rule of thumb that verifying the block-chain will be 10 times as slow, that might mean 40 hours to verify all 140k blocks. ATM I plan to just bundle the first 120k pre-verified blocks in the code, then later see about optimising this.

But I am running it over a VPS on rackspace, so that also could be a reason for the speed too.

It should be possible for you to easily modify the current storage engine I'm using to enable sqllite... The library supports multiple backends. See the constructor in src/storage/postgresql_storage.cpp:postgresql_storage() ... The sql_ object is being explicitly initialised with postgresql parameters. I don't mind making this a generic sql_storage backend if you're able to get it to work

The current way of joining the blocks using reorganize_block_chain() will be removed and done as part of the block-chain verification- that's the best way I see currently to do it... But I am doing more research of the current bitcoin sourcecode to make sure I understand everything fully. The current bitcoin is very inefficient/obtuse and there's a whole bunch of ways to do things quicker or nicer.

Ok that makes sense. I am a bit concerned that even though the update is a single statement, it must touch a lot of rows, but it should be infrequent enough to not be a problem.

My next question is regarding the very common operation of determining whether an output has been spent. Each output could be spent in a single transaction that belongs to multiple blocks (in separate forks) or by multiple transactions (also in separate forks).

So it seems that to determine if an output has been spent in a particular chain, you will have to query for all the inputs that spend the output, find (JOIN) the transactions that contain those inputs, find (JOIN) the blocks that contain those transactions, and see if any of those blocks have a span range that contains the chain in question. I know databases are supposed to be good at this kind of thing but it sure sounds like a lot of work. On the other hand, most of the transactions will not have a double-spend, so all the joining shouldn't make a difference in the most common case where there are no inputs that spend a txout. All the joining might slow it down a bit when the blockchain forks, but a few extra indexed lookups shouldn't be that big a deal I guess.

I'm thinking I'll switch to your schema with the spans, because it bugs me how my version has dynamic fields marking whether a block is part of the 'best chain' and whether a txout has been spent in the current best chain. It is a real headache to add a block to a fork other than the current best.

Ok that makes sense. I am a bit concerned that even though the update is a single statement, it must touch a lot of rows, but it should be infrequent enough to not be a problem.

My next question is regarding the very common operation of determining whether an output has been spent. Each output could be spent in a single transaction that belongs to multiple blocks (in separate forks) or by multiple transactions (also in separate forks).

So it seems that to determine if an output has been spent in a particular chain, you will have to query for all the inputs that spend the output, find (JOIN) the transactions that contain those inputs, find (JOIN) the blocks that contain those transactions, and see if any of those blocks have a span range that contains the chain in question. I know databases are supposed to be good at this kind of thing but it sure sounds like a lot of work. On the other hand, most of the transactions will not have a double-spend, so all the joining shouldn't make a difference in the most common case where there are no inputs that spend a txout. All the joining might slow it down a bit when the blockchain forks, but a few extra indexed lookups shouldn't be that big a deal I guess.

I'm thinking I'll switch to your schema with the spans, because it bugs me how my version has dynamic fields marking whether a block is part of the 'best chain' and whether a txout has been spent in the current best chain. It is a real headache to add a block to a fork other than the current best.

I plan to add a flag indicating whether an output is_final (and the parent transaction is_final - if all child outputs all have is_final set) and an output_address and for transaction a transaction_type (standard, generated and other) fields.

Also will be another table with chain_id, cumulative_difficulty... and cumulative difficulty in blocks.

For now still working on the block chain verification specifics (which algorithm is best and whether to organize then verify, pre-verify then organize then verify, or verify then link and organize). There are various pros and cons for all the different ways.