Let's go through them and see if we can increase our understanding of IPFS a bit!

Identities: name those nodes

IPFS is a P2P network of clients; there is no central server. These clients are the nodes of the network and need a way to be identified by the other nodes. If you just number the nodes 1,2,3,... anyone can add a node with an existing ID and claim to be that node. To prevent that some cryptography is needed. IPFS does it like this:

All this is done during the init phase of a node: ipfs init > the resulting keys are stored in ~/.ipfs/config and returns the NodeId.

When two nodes start communicating the following happens:

exchange public keys

check if: hash(other.PublicKey) == other.NodeId

if so, we have identified the other node and can e.g. request for data objects

if not, we disconnect from the "fake" node

The actual hashing algorithm is not specified in the white paper, read the note about that here:

Rather than locking the system to a particular set of function choices, IPFS favors self-describing values. Hash digest values are stored in multihash format, which includes a short header specifying the hash function used, and the digest length in bytes.

Example:

\\\

This allows the system to (a) choose the best function for the use case (e.g. stronger security vs faster performance), and (b) evolve as function choices change. Self-describing values allow using different parameter choices compatibly.

These multihashes are part of a whole family of self-describing hashes, and it is brilliant, check it out: multiformats.

Network: talk to other clients

The summary is this: IPFS works on top of any network (see the image above).

Interesting here is the network addressing to connect to a peer. IPFS uses multiaddr formatting for that. You can see it in action when starting a node:

Swarm listening on /ip4/127.0.0.1/tcp/4001

Swarm listening on /ip4/172.17.0.1/tcp/4001

Swarm listening on /ip4/185.24.123.123/tcp/4001

Swarm listening on /ip6/2a02:1234:9:0:21a:4aff:fed4:da32/tcp/4001

Swarm listening on /ip6/::1/tcp/4001

API server listening on /ip4/127.0.0.1/tcp/5001

Gateway (read-only) server listening on /ip4/0.0.0.0/tcp/8080

Routing: announce and find stuff

The routing layer is based on a DHT, as discussed in the previous episode, and its purpose is to:

announce that this node has some data (a block as discussed in the next chapter), or

find which nodes have some specific data (by referring to the multihash of a block), and

if the data is small enough (=< 1KB) the DHT stores the data as its value.

The command line interface and API don't expose the complete routing interface as specified in the white paper. What does work:

ipfs put and ipfs get only work for ipns records in the API. Maybe storing small data on the DHT itself was not implemented (yet)?

Exchange: give and take

Data is broken up into blocks, and the exchange layer is responsible for distributing these blocks. It looks like BitTorrent, but it's different, so the protocol warrants its own name: BitSwap.

The main difference is that wherein BitTorrent blocks are traded with peers looking for blocks of the same file (torrent swarm), in BitSwap blocks are traded cross-file. So one big swarm for all IPFS data.

BitSwap is modeled as a marketplace that incentivizes data replication. The way this is implemented is called the BitSwap Strategy, and the white paper describes a feasible strategy and also states that the strategy can be replaced by another strategy. One such a bartering system can be based on a virtual currency, which is where FileCoin comes in.

Of course, each node can decide on its own strategy, so the generally used strategy must be resilient against abuse. When most nodes are set up to have some fair way of bartering it will work something like this:

when peers connect, they exchange which blocks they have (have_list) and which blocks they are looking for (want_list)

to decide if a node will actually share data, it will apply its BitSwap Strategy

this strategy is based on previous data exchanges between these two peers

when peers exchange blocks they keep track of the amount of data they share (builds credit) and the amount of data they receive (builds debt)

this accounting between two peers is kept track of in the BitSwap Ledger

if a peer has credit (shared more than received), our node will send the requested block

if a peer has debt, our node will share or not share, depending on a deterministic function where the chance of sharing becomes smaller when the debt is bigger

a data exchange always starts with the exchange of the ledger, if it is not identical our node disconnects

So this is set up kind of cool I think: game theory in action! The white paper further describes some edge cases like what to do if I have no blocks to barter with? The answer is simply to collect blocks that your peers are looking for, so you have something to trade.

Now let's have a look how we can poke around in the innards of the BitSwap protocol.

The command-line interface has a section blocks and a section bitswap; those sound relevant :)

To see bitswap in action, I'm going to request a large file Qmdsrpg2oXZTWGjat98VgpFQb5u1Vdw5Gun2rgQ2Xhxa2t﻿ which is a video (download it to see what video!):

We see a couple of differences there, but let's not get into that. Both outputs follow the IPFS object format from the white paper. One interesting bit is the "Cid" that shows up; this refers to the newer Content IDentifier.

Another feature that is mentioned is the possibility to pin objects, which results in storage of these objects in the file system of the local node. The current go implementation of ipfs stores it in a leveldb database under the ~/.ipfs/datastore directory. We have seen pinning in action in a previous post.

The last part of this chapter mentions the availability of object level encryption. This is not implemented yet: status wip (Work in Progress; I had to look it up as well). The project page is here: ipfs keystore proposal.

The ipfs dag command hints to something new...

Intermission: IPLD

If you studied the images at the start of this post carefully, you are probably wondering, what is IPLD and how does it fit in? According to the white paper, it doesn't fit in, as it isn't mentioned at all!

My guess is that IPLD is not mentioned because it was introduced later, but it more or less maps to the Objects chapter in the paper. IPLD is broader, more general, than what the white paper specifies. Hey Juan, update the white paper will ya! :-)

But if you don't feel like reading/watching more: IPLD is more or less the same as what is described in the "Objects" and "Files" chapters here.

Moving on to the next chapter in the white paper...

Files: uh?

On top of the Merkle DAG objects IPFS defines a Git-like file system with versioning, with the following elements:

blob: there is just data in blobs and it represents the concept of a file in IPFS. No links in blobs

list: lists are also a representation of an IPFS file, but consisting of multiple blobs and/or lists

tree: a collection of blobs, lists and/or trees: acts as a directory

commit: a snapshot of the history in a tree (just like a git commit).

Now I hear you thinking: aren't these blobs, lists, and trees the same things as what we saw in the Mergle DAG? We had objects there with data, with or without links, and nice Unix-like file paths.

I heard you thinking that because I thought the same thing when I arrived at this chapter. After searching around a bit I started to get the feeling that this layer was discarded and IPLD stops at the "objects" layer, and everything on top of that is open to whatever implementation. If an expert is reading this and thinks I have it all wrong: please let me know, and I'll correct it with the new insight.

Now, what about the commit file type? The title of the white paper is "IPFS - Content Addressed, Versioned, P2P File System", but the versioning hasn't been implemented yet it seems.

Naming: adding mutability

Since links in IPFS are content addressable (a cryptographic hash over the content represents the block or object of content), data is immutable by definition. It can only be replaced by another version of the content, and it, therefore, gets a new "address".

The solution is to create "labels" or "pointers" (just like git branches and tags) to immutable content. These labels can be used to represent the latest version of an object (or graph of objects).

In IPFS this pointer can be created using the Self-Certified Filesystems I described in the previous post. It is named IPNS and works like this:

The root address of a node is /ipns/<NodeId>

The content it points to can be changed by publishing an IPFS object to this address

By publishing, the owner of the node (the person who knows the secret key that was generated with ipfs init) cryptographically signs this "pointer".

This enables other users to verify the authenticity of the object published by the owner.

Just like IPFS paths, IPNS paths also start with a hash, followed by a Unix-like path.

This chapter in the white paper also describes some methods to make addresses more human-friendly, but I'll leave that in store for the next episode which will be hands-on again. We gotta get rid of these hashes in the addresses and make it all work nicely in our good old browsers: Ten terrible attempts to make IPFS human-friendly