Forensics and Bitcoin

This article does not attempt to provide a beginners guide to Bitcoin, nor an in-depth thesis on Bitcoin forensics. Rather, it will be an overview of the potential opportunities available to digital forensics and traditional investigators to obtain evidence in relation to attributing transactions or holdings to a specific person and (legally) seizing those funds.

I will discuss academic work that has been undertaken in this area, what precautions a security aware user may take and the issues introduced by them doing so. I will also discuss an open-source Python tool called BTCscan, which has been created to accompany this article and will carve out bitcoin addresses, private keys and other Bitcoin artifacts.

This article may be of interest to persons conducting investigations for criminal, civil, personal or business reasons. Some elements may be of limited relevance to agencies without powers of seizure and/or subpoena.

Whilst I have undertaken to make this document as easy as possible for the beginner to understand, this is a highly technical area and as such further personal research may need to be performed by readers to understand all the information contained herein.

Bitcoin was not designed to be anonymous, it was designed to retain a high degree of privacy for its users, in the words of its creator Satoshi Nakamoto: “The public can see that someone is sending an amount to someone else, but without information linking the transaction to anyone”. Bitcoin transactions can be viewed as they are all recorded in the publicly accessible block chain. For this reason, Bitcoin has been described as pseudo anonymous. [1]

There is no requirement to reveal personally identifying information, including an IP address to receive, hold or transfer bitcoins, unless it is imposed upon you by third-parties, the use of which is optional. There is also no central person, issuing authority, designated intermediary, organization or country in charge of Bitcoin, therefore there is no single place to subpoena with a court order to determine user account or ownership details.

An oft repeated rule of advanced users of Bitcoin is – if you don’t control the private keys, then you do not own the bitcoins. What this references is that Bitcoin allows you retain and manage the security of your own private keys, but it doesn’t require for you to do so. A third party service can manage your private keys for you, however, trust will be required that they will take all necessary security precautions and not steal your bitcoins. This has given rise to the security conscience bitcoin owner providing their own security for their private keys, rather than relying on third party services. The storing of bitcoin without third-parties is discussed in more detail below.

Due to the above, ownership of bitcoins can be surprisingly hard to prove. Ownership can be thought of as merely knowledge of, or being able to recreate, the private key for the bitcoin address in which the bitcoins currently reside. To increase the potential difficulties for investigators, a private key can be stored – and hidden – in a number of ways. The following examples are the same private key represented in different formats, namely Brain Wallet (so called as they are constructed from text that can be memorised), WIF, WIF Compressed, Hex, Base64 and QR code of WIF: [2]

Third party websites such as wallet services or exchanges can purchase, hold, trade, store and perform bitcoin transaction for their users. Determining third parties used by a person of interest can be done by examining their internet history, or by examining the block chain. Performing the latter would require knowledge of bitcoin addresses owned by both the person of interest and the third party service, which can then be used to identify the transactions between the two. Only the direct movement of funds between these known addresses that have not passed through intermediate addresses would be of evidential value on their own, as with any intermediate hops it could be argued that the bitcoins have moved out of the control of the original owner. Third party services do not typically document the address they use, so assumptions may have to be made. [3,4,5]

De-anonymising Bitcoin users

Recently an academic paper discussed a potential way that investigators can relatively cheaply de-anonymise up to 60% of Bitcoin clients on the network. Their method works by fingerprinting users based on the connections they have to other nodes on the Bitcoin p2p network, these connections are randomised and therefore should be different for each connected user. When a user connects to another node, their IP address is advertised to that node. If an attacker is connected to enough nodes, these announcements can be watched and fingerprinting can be done. [6]

If the user is using a VPN, TOR, web-wallet or behind NAT and not receiving inbound connections, then the IP address gained by the investigator would not be that of the user. But even in these cases, the investigator would be able to collate separate and distinct transactions to the same user.

The problem with using this method is that it is not targeted and the more successful you wish it to be (i.e. up to the maximum of 60%) then the more noisy and obvious to Bitcoin users it becomes.

Users may use Bitcoin over TOR thinking that it increases privacy, but a later paper and real life attack shows doing so opens the user to man-in-the-middle attacks if they are not using specific .onion addresses. Additionally, attackers may have the ability to ban all TOR exit nodes from connecting to the Bitcoin network by abusing Bitcoins spam protections, thus forcing users to connect via more traditional means. [7,8]

The above de-anonymising attack works by performing traffic analysis on the Bitcoin network. This is different from transaction graph analysis which has been the primary subject of the bulk of academic research in this area. This is where the public ledger of bitcoin transactions (the block chain) is analysed to find patterns or other such artifacts which can then be used to assist the investigator. The three major factors that can reduce privacy for the user and are exploitable through transaction graph analysis are address reuse, change addresses and the merging of outputs. The results of block chain analysis will more than likely not provide you with the real life details of the owner, but it may reveal usage of third parties, with whom subpoenas can be lodged. [9]

Address reuse

Address reuse is treating bitcoin addresses like a bank account where a single bitcoin address is used for multiple transactions. Bitcoin addresses are not designed to be used this way, the fact that they can is by accident and not by design. There are no restrictions on the number of bitcoin addresses one person can use and the design is such that for each transaction a new bitcoin address should be created. This is not considered wasteful due to the extremely large number of addresses available (there are 1.46 × 10^48 possible bitcoin addresses, which gives every person on Earth 2.05×10^38 different addresses). [10]

When addresses are reused, all other transactions performed by that address can be seen by examining the block chain. If you are aware of a transaction made by a person of interest and that transaction comes from the same bitcoin address by which this person receives all their payments, then their earnings can easily be determined. You will also be able to look back at the history of that address, following the chains of transactions, to ascertain what other information can be extracted.

Change addresses

Bitcoins are spent by creating a transaction to transfer the funds from one address to another. Every transaction has one or more input and one or more output – this means bitcoins can be sent to more than one address in a single transaction. Each input must be a full Unspent Transaction Output (UTXO) from a previous transaction – UTXO’s cannot be partially spent. This means if you receive 10 BTC in a single transaction, you must spend the entire 10 BTC if you wish to spend any of it. To receive any change you are owed, two outputs are created; one to whom you are paying and another back to another address owned by the sender. This is referred to as a change address. If you were to buy a 1.5 BTC item with a 10 BTC unspent UTXO, it would require a transaction with two outputs; 1.5 BTC going to the seller and 8.5 BTC back to the buyer as change. [11]

This is of interest to an investigator as it can be assumed that one of the bitcoin address associated with an output of the transaction is also owned by the creator of the transaction.

Merging of outputs

If a transaction is sent where the transaction has pooled a number of UTXO’s to create the total input required, it can be assumed that all of those addresses which were merged to create the input to the transaction are owned by the same person.

Mixers

Mixing services (also called laundries and tumblers) are used to exchange a set of bitcoins which are perceived to be tainted for another set which are believed to be taint free, for a small fee. This is done in an attempt to stop investigators being able to follow the block chain and determine current ownership of bitcoins that they have a specific interest in. If a mixing service operates as advertised, then the bitcoins that the user ends up with have no relation to those deposited into the system. The user has to generally trust that the service has enough customers in order to effectively mix the amount of bitcoins they have deposited and that they do not retain any log files of the mixing.

Some research has indicated that at least in some cases, the mixing services may not be as secure as they suggest to their users, with transactions of services being able to be picked out of the block chain or the deposited funds being stolen outright. [12,13]

Bitcoin mixing services may soon be replaced by privacy enhanced wallets such as dark wallet which uses transaction mixing with every transaction conducted by the wallet as well the use of advanced privacy stealth addresses.

Another method for anonymising bitcoins can be to deposit them in a popular wallet service or exchange and withdraw them some time later – although the service used may retain records. If the exchange supports other crypto-currencies such as litecoin, ripple, dogecoin, namecoin etc., then funds could be converted to a different crypto-currency before being traded back or sold to obfuscate the transaction history across another block chain. [14]

Any movement or splitting of bitcoins can be a potential issue for investigators, as it can prove extremely difficult, if not impossible to determine what has occurred. Have they been moved from one address owned by the person of interest to another, have they been moved into an exchange or a wallet service or have they been sold? That being said, for large amounts of bitcoins it is easier to obtain plausible deniability of ownership as opposed to full anonymity, for smaller sums either can be obtained with little effort and a bit of knowledge.

The method by which the block chain is analysed to determine which bitcoin addresses are related to others is called taint analysis. If a person of interest is performing trait analysis on addresses, it may mean that they have performed bitcoin mixing and they wish to check that it was successful. [15]

Bitcoin user de-anonymisation, particularly block chain analysis, is an area which I foresee being of ongoing interest to academia and commercial communities.

Advanced security Bitcoin

While the best-case scenario for an investigator having seized a computer would be to find private keys in plain text, this is increasingly unlikely due to the additional security precautions required to keep bitcoin storage safe from potential thieves and (depending on who is holding the bitcoins) investigators.

An advanced user who has decided to provide their own security and to not trust third parties is likely to store the private keys protecting the majority of their holdings off-line, either digitally or physically. This is called cold-storage.

One way of storing private keys is with a paper wallet; these commonly containing two QR codes, one showing the bitcoin address, the other the private key for spending. These may be folded up so that the spend QR code is not viewable, with only the Bitcoin address showing on the outside. They may also be sealed with holographic tamper resistant tape, and / or sealed in water resistant bags.

Recently created paper wallets may be BIP-0038 protected, where the private keys are encrypted so that a password is required to access the funds. If a BIP-0038 paper wallet is obtained, the password will also be required in order to move the bitcoins. BIP-0038 encrypted private keys start with a 6, non encrypted (as above) start with a 5. [16]

Be aware that paper wallets and / or private keys (encrypted or not) can be duplicated and stored in multiple locations. For example, someone may store a copy of their BIP-0038 encrypted private key on their local computer, another in a document stored in the cloud, a print out of the same in a safe with a friend holding a further copy. With multiple copies of the same private keys only one would be required to move the bitcoins (potentially away from seizure), as they are all the same.

Another method is storing the private keys on a low cost computer such as a Raspberry Pi or a cheap laptop which is then kept air-gapped from all networks. To access the bitcoins, transactions can be signed using private keys stored on the air-gapped computer while offline, then these signed transaction can be transported to an internet attached computer via an USB flash drive, before being sent anonymously on to the Bitcoin network as a raw transaction. Alternatively, a Linux live CD or USB can be used to boot into a known-safe environment, with the private keys kept on a separate, possibly encrypted, USB flash drive. This could also be kept permanently off-line, or used only with a security distribution such as tails for enhanced network security.

Bitcoin addresses with enhanced security

Bitcoins can be stored in addresses with built in redundancy called m-of-n or multi-sig. Whilst these have similar results and are sometimes referred to interchangeably, they utilise different technologies – one operating outside of the block chain, the other working explicitly within it. They both allow access to funds within a bitcoin address to be split up between a group of people or different locations to enhance security.

M-of-n utilises Shamir’s Secret Sharing (SSS) algorithm, SSS splits the private key into a number (n) of pieces and that at least a set number (m) of these pieces are required to be brought together in order to reconstruct the private key. For example an address split into 2-of-3, has three parts, any two of which are required to be brought together to recreate the original private key. These parts may be retained by the same person secured in different locations or they may be handed to three different people. [17]

Multi-sig addresses are similar in concept, but they utilises the scripting capabilities of Bitcoin transactions to lock a bitcoin address with multiple private keys and again requiring a certain number of them to unlock the funds. Of the two using multi-sig addresses is considered the more secure and powerful way of performing key splitting, as unlike SSS, with multi-sig addresses the separate private keys do not need to come together in one place (a potential point of failure) in order to unlock the funds. [18]

Although they offer increased security, multi-sig addresses are little used – in fact none of the 100 addresses that contain the largest amount of bitcoins are multi-sig. [19]

Bitcoin hardware

A device called Trezor is the most common currently available hardware bitcoin wallet, and is used to securely store Bitcoin private keys. It is a custom made single-purpose hardware device and is therefore considered more secure than a computer for storing bitcoins, as general purpose computers are more susceptible to malware and the hardware wallet cannot be directly connected to the internet nor can it be accessed without the correct pin-code being entered. Transactions can be signed using the stored private keys by creating the transaction using an app or website that is compatible with the hardware wallet and then confirming on the hardware wallet that you wish to sign the transaction.

When a Trezor is set up, a recovery seed is created. This consists of a list of words and if obtained, the private keys can be replicated on any other Trezor device. A recent firmware update had included the requirement for the recovery seed to be PIN protected as an additional layer of security.

While the Trezor is by far the most popular at the moment, other hardware security devices are beginning to enter the market. Other devices can look like thick credit cards with buttons and a small LCD display or small USB devices – potentially kept on a key ring. Other Bitcoin related USB devices are 2FA (Two Factor Authentication) devices, allowing the user to authenticate themselves to websites (typically wallet services or exchanges). These may mimic USB flash drives in appearance or be much smaller and they may or may not have a button on them. Alternatively, 2FA can be supplied by receiving SMS messages or automated telephone calls, an app such as Google Authenticator or Authy on a smartphone or pre-printed one-time codes.

Additional artifacts

As discussed above, address reuse reduces privacy, the way to avoid this is to create a new bitcoin address for each transaction. If a wallet service or app is used, then that wallet will manage the multiple address for the user. In the early days of Bitcoin this was an issue as any new addresses created wouldn’t necessarily be replicable from older backups of the user’s wallet. This has been solved by HD (hierarchical deterministic) wallets, which can create an unlimited amount of bitcoin addresses from a single seed. As these addresses are created in a predictable manner, only the seed needs to be backed up, without having to worry about the backup being out of date. A BIP-0039 compliant HD wallet (such as the Trezor discussed above) stores the seed as a 12 or more word mnemonic using common English words. [20]

Internally, HD wallets store seeds as private and public node keys. These can be separated, so one computer may be capable of creating addresses for an HD wallet, but not being able to access the funds itself.

Also relating to blockchain.info is the wallet identifier, which is used to log-on to the site. This has an appearance similar to that shown below:

a8c1022a-34ef-4f9b-976a-1b06280726ec

A redeem script is a section of hex which is required to validate and to spend bitcoins stored in a multi-sig address. The same script is given to each holder of a part of a multi-sig address.

A final potential opportunity is to examine any available internet history, to ascertain if popular Bitcoin websites have been viewed and if particular bitcoin addresses have been looked up in the block chain.

File / Memory forensics

A bitcoin address is between 26 and 35 characters (but usually 34 characters in length), and a regular expression or grep search can be constructed to search for them. Shown below are ANSI and unicode variants of these searches:

ANSI: 1[a-km-zA-HJ-NP-Z1-9]{25,34}

Unicode: 1\x00([a-km-zA-HJ-NP-Z1-9]\x00){25,34}

Interestingly, these do not appear to work correctly within EnCase 6.19.7.2; this may be due to the use of brackets being undocumented in the EnCase manual. Testing in Python show that the expressions are correct.

If these regular expressions are used to search files for bitcoin addresses, a large amount of false-positives will be returned. Therefore there needs to be some way to check each address found for correctness. Luckily, the format into which bitcoin addresses are encoded has built-in error checking; this is called Base58Check. There are two parts to this encoding. Firstly, the Base58 part is a binary-to-text encoding scheme created specifically for Bitcoin. Base58 only uses the characters 0-9, a-z and A-Z excluding 0 (zero), O (capital o), l (lowercase L) and I (capital i), the omitted characters are removed as they can cause visual ambiguity, leaving 58 characters, hence Base58. Secondly, the check part is error checking where the last four bytes of the string are the double SHA-256 digest of the preceding data. By checking the validity of the Base58Check encoding for any strings found by the above RE / grep search, we can determine which are false-positives and more importantly, which are not. [22]

Handily, bitcoin addresses are not the only items in the Bitcoin ecosystem that use Base58Check encoding, other items utilising it are Bitcoin P2SH addresses, BIP-0038 Encrypted Private Keys, private keys in WIF (Wallet Import Format) for both uncompressed and compressed public keys and the public & private node keys for BIP-0032 HD wallets.

BTCscan

BTCscan is a Python script I have written which automates the extraction of Base58Check encoded strings that meet the format of the Bitcoin items mentioned above, from any file(s) that the script is run over. If directed towards a folder, the script will iterate over the files and folders within that directory searching each one in turn. BTCscan is open-source, does not use any non-default Python3 dependencies and is available here: https://gist.github.com/chriswcohen/7e28c95ba7354a986c34

BTCscan is for the most part simplistic, it will not look in forensic areas (UC, files slack, ADS etc) and will not find relevant items which are not ANSI or unicode. It will scan across DD files of a drive but only as a flat file. It is neither optimised, nor high quality – but it works (at least on Windows) and I have had success running this tool over a memory image and items have been recovered from previously deleted files from within a DD image of a drive. BTCscan may recover items from files associated with Bitcoin software and cache files from Bitcoin related websites. Do be aware some false positives can creep in, for example if a Bitcoin address contains within it a Base58check correctly formatted, but unknown P2SH address.

[3] A web wallet, eWallet or online wallet is a bitcoin wallet hosted on the internet by a third party.

[4] Although this website http://www.walletexplorer.com/ appears to do this, the methodology through which they obtain their data is unknown. It is likely to be similar to that identified in the paper A Fistful of Bitcoins: Characterizing Payment Among Men with No Names, where services are used by researchers to determine some of the addresses they use, which are then analysed along with all address in the block chain using heuristics and the methods mentioned in this article to group other addresses to the services and to unknown users.

[10] Address reuse also reduces the security of the bitcoins stored in those addresses. Transaction signing requires 256 bytes of random data (r-value) so that the private key cannot be reverse engineered. If the r-value is not truly random then the private key can be determined, which can be used to sign other transactions for that particular bitcoin address. This attack can be negated by not reusing addresses, as once a transaction is signed from a bitcoin address, it remains empty.

[18] Multi-sig addresses are an example of a pay to script hash (P2SH) address. Instead of paying into a bitcoin address (which always starts with a 1 and is derived from a random 256 bit number), you pay into a P2SH address (which always starts with a 3 and is derived from a script). These script can be considerably complex, allowing such things as smart contracts, smart property and escrow. The scripting engine isn’t Turing complete as loops have been purposely omitted.