Common standards

Hashes

Usually, when a hash is computed within bitcoin, it is computed twice. Most of the time SHA-256 hashes are used, however RIPEMD-160 is also used when a shorter hash is desirable (for example when creating a bitcoin address).

Note: Hashes in Merkle Tree displayed in the Block Explorer are of little-endian notation. For some implementations and calculations, the bits need to be reversed before they are hashed, and again after the hashing operation.

Signatures

Public keys (in scripts) are given as 04 <x> <y> where x and y are 32 byte big-endian integers representing the coordinates of a point on the curve or in compressed form given as <sign> <x> where <sign> is 0x02 if y is even and 0x03 if y is odd.

Signatures use DER encoding to pack the r and s components into a single byte stream (this is also what OpenSSL produces by default).

Transaction Verification

Transactions are cryptographically signed records that reassign ownership of Bitcoins to new addresses. Transactions have inputs - records which reference the funds from other previous transactions - and outputs - records which determine the new owner of the transferred Bitcoins, and which will be referenced as inputs in future transactions as those funds are respent.

Each input must have a cryptographic digital signature that unlocks the funds from the prior transaction. Only the person possessing the appropriate private key is able to create a satisfactory signature; this in effect ensures that funds can only be spent by their owners.

Each output determines which Bitcoin address (or other criteria, see Script) is the recipient of the funds.

In a transaction, the sum of all inputs must be equal to or greater than the sum of all outputs. If the inputs exceed the outputs, the difference is considered a transaction fee, and is redeemable by whoever first includes the transaction into the block chain.

A special kind of transaction, called a coinbase transaction, has no inputs. It is created by miners, and there is one coinbase transaction per block. Because each block comes with a reward of newly created Bitcoins (e.g. 50 BTC for the first 210,000 blocks), the first transaction of a block is, with few exceptions, the transaction that grants those coins to their recipient (the miner). In addition to the newly created Bitcoins, the coinbase transaction is also used for assigning the recipient of any transaction fees that were paid within the other transactions being included in the same block. The coinbase transaction can assign the entire reward to a single Bitcoin address, or split it in portions among multiple addresses, just like any other transaction. Coinbase transactions always contain outputs totalling the sum of the block reward plus all transaction fees collected from the other transactions in the same block.

The coinbase transaction in block zero cannot be spent. This is due to a quirk of the reference client implementation that would open the potential for a block chain fork if some nodes accepted the spend and others did not[1].

Most Bitcoin outputs encumber the newly transferred coins with a single ECDSA private key. The actual record saved with inputs and outputs isn't necessarily a key, but a script. Bitcoin uses an interpreted scripting system to determine whether an output's criteria have been satisfied, with which more complex operations are possible, such as outputs that require two ECDSA signatures, or two-of-three-signature schemes. An output that references a single Bitcoin address is a typical output; an output actually contains this information in the form of a script that requires a single ECDSA signature (see OP_CHECKSIG). The output script specifies what must be provided to unlock the funds later, and when the time comes in the future to spend the transaction in another input, that input must provide all of the thing(s) that satisfy the requirements defined by the original output script.

Addresses

A bitcoin address is in fact the hash of a ECDSA public key, computed this way:

Variable length integer

Integer can be encoded depending on the represented value to save space.
Variable length integers always precede an array/vector of a type of data that may vary in length.
Longer numbers are encoded in little endian.

Value

Storage length

Format

< 0xfd

1

uint8_t

<= 0xffff

3

0xfd followed by the length as uint16_t

<= 0xffffffff

5

0xfe followed by the length as uint32_t

-

9

0xff followed by the length as uint64_t

If you're reading the Satoshi client code (BitcoinQT) it refers to this as a "CompactSize". Modern BitcoinQT has also CVarInt class which implements even more compact integer for the purpose of local storage (which is incompatible with "CompactSize" described here). CVarInt is not a part of the protocol.

Variable length string

Variable length string can be stored using a variable length integer followed by the string itself.

IPv6 address. Network byte order. The original client only supported IPv4 and only read the last 4 bytes to get the IPv4 address. However, the IPv4 address is written into the message as a 16 byte IPv4-mapped IPv6 address

Message types

version

When a node creates an outgoing connection, it will immediately advertise its version. The remote node will respond with its version. No further communication is possible until both peers have exchanged their version.

And here's a modern (60002) protocol version client advertising itself to a local peer...

Newer protocol includes the checksum now, this is from a mainline (satoshi) client during
an outgoing connection to another local client, notice that it does not fill out the
address information at all when the source or destination is "unroutable".

getdata

getdata is used in response to inv, to retrieve the content of a specific object, and is usually sent after receiving an inv packet, after filtering known elements. It can be used to retrieve transactions, but only if they are in the memory pool or relay set - arbitrary access to transactions in the chain is not allowed to avoid having clients start to depend on nodes having full transaction indexes (which modern nodes do not).

getblocks

Return an inv packet containing the list of blocks starting right after the last known hash in the block locator object, up to hash_stop or 500 blocks, whichever comes first.

The locator hashes are processed by a node in the order as they appear in the message. If a block hash is found in the node's main chain, the list of its children is returned back via the inv message and the remaining locators are ignored, no matter if the requested limit was reached, or not.

To receive the next blocks hashes, one needs to issue getblocks again with a new block locator object. Keep in mind that some clients may provide blocks which are invalid if the block locator object contains a hash on the invalid branch.

Note that it is allowed to send in fewer known hashes down to a minimum of just one hash. However, the purpose of the block locator object is to detect a wrong branch in the caller's main chain. If the peer detects that you are off the main chain, it will send in block hashes which are earlier than your last known block. So if you just send in your last known hash and it is off the main chain, the peer starts over at block #1.

getheaders

Return a headers packet containing the headers of blocks starting right after the last known hash in the block locator object, up to hash_stop or 2000 blocks, whichever comes first. To receive the next block headers, one needs to issue getheaders again with a new block locator object. The getheaders command is used by thin clients to quickly download the block chain where the contents of the transactions would be irrelevant (because they are not ours). Keep in mind that some clients may provide headers of blocks which are invalid if the block locator object contains a hash on the invalid branch.

The SHA256 hash that identifies each block (and which must have a run of 0 bits) is calculated from the first 6 fields of this structure (version, prev_block, merkle_root, timestamp, bits, nonce, and standard SHA256 padding, making two 64-byte chunks in all) and not from the complete block. To calculate the hash, only two chunks need to be processed by the SHA256 algorithm. Since the nonce field is in the second chunk, the first chunk stays constant during mining and therefore only the second chunk needs to be processed. However, a Bitcoin hash is the hash of the hash, so two SHA256 rounds are needed for each mining iteration.
See Block hashing algorithm for details and an example.

headers

The headers packet returns block headers in response to a getheaders packet.

Note that the block headers in this packet include a transaction count (a var_int, so there can be more than 81 bytes per header) as opposed to the block headers which are sent to miners.

getaddr

The getaddr message sends a request to a node asking for information about known active peers to help with finding potential nodes in the network. The response to receiving this message is to transmit one or more addr messages with one or more peers from a database of known active peers. The typical presumption is that a node is likely to be active if it has been sending a message within the last three hours.

No additional data is transmitted with this message.

mempool

The mempool message sends a request to a node asking for information about transactions it has verified but which have not yet confirmed. The response to receiving this message is an inv message containing the transaction hashes for all the transactions in the node's mempool.

No additional data is transmitted with this message.

It is specified in BIP 35. Since BIP 37, only transactions matching the filter are replied.

checkorder

This message was used for IP Transactions. As IP transactions have been deprecated, it is no longer used.

submitorder

This message was used for IP Transactions. As IP transactions have been deprecated, it is no longer used.

reply

This message was used for IP Transactions. As IP transactions have been deprecated, it is no longer used.

ping

The ping message is sent primarily to confirm that the TCP/IP connection is still valid. An error in transmission is presumed to be a closed connection and the address is removed as a current peer.

Payload:

Field Size

Description

Data type

Comments

8

nonce

uint64_t

random nonce

pong

The pong message is sent in response to a ping message. In modern protocol versions, a pong response is generated using a nonce included in the ping.

Payload:

Field Size

Description

Data type

Comments

8

nonce

uint64_t

nonce from ping

reject

The reject message is sent when messages are rejected.

Payload:

Field Size

Description

Data type

Comments

1+

message

var_str

type of message rejected

1

ccode

char

code relating to rejected message

1+

reason

var_str

text version of reason for rejection

CCodes

Value

Name

Description

0x01

REJECT_MALFORMED

0x10

REJECT_INVALID

0x11

REJECT_OBSOLETE

0x12

REJECT_DUPLICATE

0x40

REJECT_NONSTANDARD

0x41

REJECT_DUST

0x42

REJECT_INSUFFICIENTFEE

0x43

REJECT_CHECKPOINT

filterload, filteradd, filterclear, merkleblock

These messages are related to Bloom filtering of connections and are defined in BIP 0037.

The filterload command is defined as follows:

Field Size

Description

Data type

Comments

?

filter

uint8_t[]

The filter itself is simply a bit field of arbitrary byte-aligned size. The maximum size is 36,000 bytes.

4

nHashFuncs

uint32_t

The number of hash functions to use in this filter. The maximum value allowed in this field is 50.

4

nTweak

uint32_t

A random value to add to the seed value in the hash function used by the bloom filter.

1

nFlags

uint8_t

A set of flags that control how matched items are added to the filter.

See below for a description of the Bloom filter algorithm and how to select nHashFuncs and filter size for a desired false positive rate.

Upon receiving a filterload command, the remote peer will immediately restrict the broadcast transactions it announces (in inv packets) to transactions matching the filter, where the matching algorithm is specified below. The flags control the update behaviour of the matching algorithm.

The filteradd command is defined as follows:

Field Size

Description

Data type

Comments

?

data

uint8_t[]

The data element to add to the current filter.

The data field must be smaller than or equal to 520 bytes in size (the maximum size of any potentially matched object).

The given data element will be added to the Bloom filter. A filter must have been previously provided using filterload. This command is useful if a new key or script is added to a clients wallet whilst it has connections to the network open, it avoids the need to re-calculate and send an entirely new filter to every peer (though doing so is usually advisable to maintain anonymity).

The filterclear command has no arguments at all.

After a filter has been set, nodes don't merely stop announcing non-matching transactions, they can also serve filtered blocks. A filtered block is defined by the merkleblock message and is defined like this:

Field Size

Description

Data type

Comments

4

version

uint32_t

Block version information, based upon the software version creating this block

32

prev_block

char[32]

The hash value of the previous block this particular block references

32

merkle_root

char[32]

The reference to a Merkle tree collection which is a hash of all transactions related to this block

4

timestamp

uint32_t

A timestamp recording when this block was created (Limited to 2106!)

4

bits

uint32_t

The calculated difficulty target being used for this block

4

nonce

uint32_t

The nonce used to generate this block… to allow variations of the header and compute different hashes

alert

An alert is sent between nodes to send a general notification message throughout the network. If the alert can be confirmed with the signature as having come from the the core development group of the Bitcoin software, the message is suggested to be displayed for end-users. Attempts to perform transactions, particularly automated transactions through the client, are suggested to be halted. The text in the Message string should be relayed to log files and any user interfaces.

Alert format:

Field Size

Description

Data type

Comments

?

payload

uchar[]

Serialized alert payload

?

signature

uchar[]

An ECDSA signature of the message

The developers of Satoshi's client use this public key for signing alerts: