It's not that the dumps would be different, but the file offset stored in the datadir table would probably not be at a block boundary in the local block file. The solution is to reset the pointer. Abe should have a command-line option for this or even do it automatically, but currently we do it with:

Code:

UPDATE datadir SET blkfile_number = 1, blkfile_offset = 0;

The next run will spend a few(?) minutes scanning the block file, skipping blocks already loaded via the dump.

Yup, or someone with time to spare might write export and import functions, dumping and loading the data in a bitcoin-specific, db-neutral format. If that runs pretty fast, write a translator from block files to that format, and it might approach the speed of torrent+mysql for the initial load. The main thing, I suspect, is to create indexes after the tables have data.

John, will Abe check database consistency after startup from initial db import and check if db and local blockchain in bitcoind are the same? I mean - isn't blindly using export from some unknown entity potential attack vector?

At least torrent file have a checksum, so anybody who trust me can trust the torrent download, too. But would be nice to know that Abe is checking it by self...

John, will Abe check database consistency after startup from initial db import and check if db and local blockchain in bitcoind are the same? I mean - isn't blindly using export from some unknown entity potential attack vector?

Abe verifies proof of work and, as of 0.6, transaction Merkle trees on import. Yes, an export/import tool like this should come with caveats about trust. There's a verify.py script (possibly out of date) that verifies the Merkle roots already loaded, and it would be simple to add proof-of-work checks there or as part of an import tool. Of course, if it is part of a system for fast loading of a local, known good block chain, it's not so vulnerable.

Edit: By "verifies proof of work" I do not mean checking hashes against the target or difficulty, just verifying that the "previous block hash" is indeed the hash of the previous block's header. Adding a target check would be nice, though challenging for alternative chains that represent target and proof differently.

Today I tried to understand Abe's source code and although I'm still confused, I may understand it a bit more than before. From what I see, Abe is parsing blockfile and is reconstructing blockchains and transactions in the SQL with many checks. What happen when block stored in the blockfile is orphaned or blockchain is forked? Does Abe solve such issues correctly? Afaik blockfile is just dumb store of block structures, so it already should do all validations.

Why I'm asking:

I don't like that Abe needs blockchain stored locally, it makes it far less flexible, For example running full Abe installation (bitcoind + database + application) on VPS is pretty problematic, because of memory consumption and required disk I/O (bitcoind itself is using disk a lot, plus Abe makes disk busy by database writes). For running Stratum servers (where I want to use Abe internally, at least for initial implementation), I need as small footprint as possible, to have a possibility to run Stratum server also on cheap VPS.

I have already some experience with communicating over Bitcoin P2P, so I have an idea of patching Abe for loading blocks and transactions directly from the network. In this case, Abe will need only (trusted?) bitcoin node to connect to port 8333. Unfortunately, my networking code does not do any block/transaction validation, it just receive messages and parse them into python objects. So my question is related to this; when I'll feed Abe with those deserialized data from P2P network, will Abe check everything necessary to have consistent index in the database?

I don't like that Abe needs blockchain stored locally, it makes it far less flexible, For example running full Abe installation (bitcoind + database + application) on VPS is pretty problematic, because of memory consumption and required disk I/O (bitcoind itself is using disk a lot, plus Abe makes disk busy by database writes).

Hmm, I run it on a (good) VPS, granted it uses quite a bit of disk spare, but other than that it works fine, the Abe block viewer is a bit slow, but I dont think that would change, just by moving the bitcoind files off the server ?

You could split it out on several servers, but ofcause that would not make it cheaper Abe just needs the files, it should not care if bitcoind is infact running on the machine, ie. network mounted share.And Abe will connect to a MySQL server other than localhost withot problems.

Hmm, I run it on a (good) VPS, granted it uses quite a bit of disk spare, but other than that it works fine, the Abe block viewer is a bit slow, but I dont think that would change, just by moving the bitcoind files off the server ?

The key is RAM; it's pretty slow for you, because database don't fit the RAM and every request is spinning the HDD a lot. In the ideal world, full Abe's database should be loaded into the memory, which is pretty hard to achieve on VPS (MySQL database actually have around 4.5GB), but at least database indexes should fit into the memory (around 1.5 GB), which is doable. Smaller server memory will provide poor performance, exactly as you're reporting. Move bitcoind outside the machine can save around 200 MB of RAM and significant portion of disk I/O usage.

Your idea with mounting blockfile over the net would probably works, you're right. But it is still more like a hack than real solution; you still need disk access to blockchain and handling failover of NFS mount is much harder than providing the pool of trusted P2P nodes to connect. If John confirm that my idea with feeding from P2P network will work, I'll try it. Otherwise will setup NFS mounts...

Today I tried to understand Abe's source code and although I'm still confused, I may understand it a bit more than before. From what I see, Abe is parsing blockfile and is reconstructing blockchains and transactions in the SQL with many checks. What happen when block stored in the blockfile is orphaned or blockchain is forked? Does Abe solve such issues correctly? Afaik blockfile is just dumb store of block structures, so it already should do all validations.

Abe has logic to attach orphaned blocks and reorganize a forked chain. As far as I know it works, but it is the area I would most like to test when I have time. Relevant code: adopt_orphans and _offer_block_to_chain in Abe/DataStore.py.

I don't like that Abe needs blockchain stored locally, it makes it far less flexible, For example running full Abe installation (bitcoind + database + application) on VPS is pretty problematic, because of memory consumption and required disk I/O (bitcoind itself is using disk a lot, plus Abe makes disk busy by database writes). For running Stratum servers (where I want to use Abe internally, at least for initial implementation), I need as small footprint as possible, to have a possibility to run Stratum server also on cheap VPS.

I have already some experience with communicating over Bitcoin P2P, so I have an idea of patching Abe for loading blocks and transactions directly from the network. In this case, Abe will need only (trusted?) bitcoin node to connect to port 8333. Unfortunately, my networking code does not do any block/transaction validation, it just receive messages and parse them into python objects. So my question is related to this; when I'll feed Abe with those deserialized data from P2P network, will Abe check everything necessary to have consistent index in the database?

Abe does not validate blocks beyond what's needed to "checksum" a chain up to a trusted current-block hash. Complete block validation is very hard and not on my priority list, though I might add hooks to use external logic. (Wrapping Abe.DataStore.import_block with a subclass might suffice.)

I don't expect a problem feeding Abe deserialized data. You would need a structure like that created by Abe.deserialize.parse_Block with one extra element: 'hash' whose value is the block header hash as a binary string. The structure is based on Gavin's BitcoinTools. You would pass that structure "b" to store.import_block(b, frozenset([1])). (chain_id 1 = main BTC chain) Abe.DataStore.import_blkdat does this for every block in blk0*.dat that was not previously loaded.

One way to shrink the footprint would be to add support for binary SQL types. Abe supports the SQL 1992 BIT type and tests for it at installation, but the only database that passes the test is SQLite, which is unsuitable for large servers. On MySQL and all the others, Abe falls back to binary_type=hex and stores scripts and hashes in hexadecimal, wasting half the bytes. Relevant code is in DataStore: configure_binary_type, _set_sql_flavour (beneath the line "val = store.config.get('binary_type')"), and _sql_binary_as_hex, where Abe translates DDL from standard BIT types to CHARs of twice the length.

Another improvement would be to remove unneeded features (or, ideally, make them optional) such as the Coin-Days Destroyed calculation (block_tx.satoshi_seconds_destroyed etc.) and the unused pubkey.pubkey column.

I tried the sql file from the torrent today, since the VPs was having a hard time catching up.It didnt work out too well, after importing the SQL, one needs to manually recreate all views, because they contain a security definer pointing to an invalid user.

But then Abe can import, however once its done skipping rows, it fails because it tries to insert a block at id 1, if I rerun it, it ties as 2,3,4 etc.

I assume this value is 'mysql' in your case. The 'mysql' sequence implementation associates an empty table with just one column (an auto_increment) with each sequenced table. For example, the next `block_seq`.`id` would become the next `block`.`block_id`. Apparently, the dump/load process did not preserve the tables' internal counters.

This script might fix things for you.

Code:

INSERT INTO block_seq (id) SELECT MAX(block_id) FROM block;DELETE FROM block_seq;INSERT INTO magic_seq (id) SELECT MAX(magic_id) FROM magic;DELETE FROM magic_seq;INSERT INTO policy_seq (id) SELECT MAX(policy_id) FROM policy;DELETE FROM policy_seq;INSERT INTO chain_seq (id) SELECT MAX(chain_id) FROM chain;DELETE FROM chain_seq;INSERT INTO datadir_seq (id) SELECT MAX(datadir_id) FROM datadir;DELETE FROM datadir_seq;INSERT INTO tx_seq (id) SELECT MAX(tx_id) FROM tx;DELETE FROM tx_seq;INSERT INTO txout_seq (id) SELECT MAX(txout_id) FROM txout;DELETE FROM txout_seq;INSERT INTO pubkey_seq (id) SELECT MAX(pubkey_id) FROM pubkey;DELETE FROM pubkey_seq;INSERT INTO txin_seq (id) SELECT MAX(txin_id) FROM txin;DELETE FROM txin_seq;