Confidentiaility regarding which private folders have data, and which
are empty.

Confidentiality for quota data.

In other words, an adversary who has access to Keybase's server data could:
learn about who is communicating with whom via KBFS; would have the same
usage details we would eventually use for billing; and would be able to recover data encrypted
for lost or decomissioned devices, if they additionally recover those devices.

Of course we are mere mortals and don't believe we can indefinitely stave off
conserted attacks against our infrastructure, so we encourage our users to only
depend on our end-to-end security claims.

Additionally, we don't guarantee availability of data or metadata in the
case of server compromise.

System Overview

Sibling keys and subkeys.
Each user in Keybase keeps two keypairs per device; one keypair
for signing, and one pair for encryption. Secret keys never leave the
device. The signing keys are known as "siblings" since they
all are equally powerful. Any active sibling key can provision or revoke
another device. Subkeys are signed by an active sibling key but can't
delegate to other keys. The first of a series of sibling keys is
called an "eldest" key and can be revoked without affecting its other
live siblings.

Private Personal Directories
Every keybase user gets a private personal directory, keyed so that all
of her devices can read and write it.

Public Directories
Groups of one-or-more users can create public directories. Items in
public directories are signed by one of the devices of one of the directory's
writers; they are not encrypted. Public directories are good places to release
software or authenticated manifestos.

Private Group Directories
The most interesting case, a group of people with read/write permission
get together with some other people who have read-only permission to share
a directory. This directory should use authenticated encryption to
guarantee that only someone in the group could have written to it. It
should further use some mechanism (currently vanilla signatures) to determine which writer is
responsible for which write, so that users can't put words in each other's mouths.

Forest of Merkle Trees [Unimplmented]
Keybase keeps public Merkle trees for managing: (1) each user's collection of keypairs
and public identities; and (2) the progression of file system states, so that the
server can't maliciously roll back to previous states.

Keys

Eldest Key

When a user Alice (A) joins, she starts off by generating or
importing a keypair. For our current users, this is a PGP keypair,
denoted (P​A​​,p​A​​). But there's nothing special about PGP keys,
and we are free to change the format in the future. We call this keypair
Alice's eldest keypair, since it's the first among potentially
many sibling keys.

Per-Device Keys

For each of her devices, Alice generates two per-device keypairs.
The signing key becomes a sibkey, and the encryption keypair is just a
subkey.

For Alice's ith device, the procedure is:

Generate an Ed25519Keypair at random, yielding the
pair (E​A​i​​,e​A​i​​). In Ed25519, the private key is a 64-byte
string, and the public key is a 32-byte (derived) string.

Generate a Curve25519 Keypair suitable for use
with NaCL's Box feature (or Go Crypto's box.Seal feature). Call
it (M​A​i​​,m​A​i​​). In Curve25519
(Montgomery form), the public and private key are both 32-byte strings.

Sign E​A​i​​ with a valid sibling key; and as in PGP, include a "reverse" signature
of the delegating key, signed by the new key. Push this sibkey signature
into Alice's signature chain. See our doc on key
for more specifics.

Key IDS (kids)

Keybase has an idea of a "Key ID". In the case of PGP keys, it's a hash
of the public key materials. In the case of ECC keys, it's the public key
itself. We describe here how to construct them in Version 1:

For a PGP public key P, compute the serialization s(P)
as you would to compute a PGP fingerprint. Let t be the one-byte type
of the key in question. For example, or RSA, t=1; for DSA t=17.
See RFC 4880 and RFC 6637 for a complete list.
Then:

kid(P) = 0x01 || t || SHA-256(s(P)) || 0x0a

For a NaCl encryption key, E:

kid(E) = 0x01 || 0x21 || E || 0x0a

For a NaCl EDDSA key, M:

kid(M) = 0x01 || 0x20 || M || 0x0a

Top-Level Folder (TLFs) in KBFS

A top-level folder (TLF) in KBFS has a fixed set of readers and writers, as specified by
the name of the folder. Every file and folder recursively contained in a TLF has
the same permissions as its parent. This orginization will feel a little bit different from normal
POSIX semantics, but greatly simplifies the system.

Private TLFs

We cover the case of private TLFs with multiple readers and writers first, since
it's the most general. Home directories are a simple subcase of general private
group TLFs.

Keying a Private Group TLF

Let's say Alice (A) is creating a new TLF for Bob (B)
and Charlie (C). Bob has read/write access, and Charlie only gets
read-only access. Alice keys the folder (call it f) for all other
users when she creates it. The procedure is as follows:

Generate a random 15-byte directory ID, with the suffix 0x16

Generate a per-TLF Curve25519 DH key pair for inclusion of this folder's
metadata in the site's private-data Merkle tree; call it (M​f​​,m​f​​).

generate s​u​f,0,i​​= rand(32).
This will be the server-side half of the key.

t​u​f,0,i​​=s​u​f,0,i​​⊕s​f,0​​. That is, XOR the global per-TLF key together with the per-user-per-device key to get a masked key.

S​u​f,0,i​​=(M​e​​,Box(m​e​​,M​u​i​​,t​u​f,0,i​​)).
That is, run the NaCl Box function with the ephemeral private key, with the user's public device key,
and on the message t​u​f,0,i​​.
This new key S​u​f,0,i​​ can now be published publically. It
can only be decrypted by u's private device key, and with some server
assist (via s​u​f,0,i​​). Note that we prepend the public key
M​e​​ to the output of Box. Eventually, we might need to add new keys
or replace existing keys, and we'll do so with a new ephemeral keypair. So
M​e​​ might eventually be replaced.

Publish the following metadata (md​c​​) in the clear:

A block of reader keys. In this case it's just Charlie who is read-only:

readers = [S​C​f,0,1​​,S​C​f,0,2​​,S​C​f,0,3​​]

A block of writer keys. In this case it's Alice and Bob:

writers = [S​A​f,0,1​​,S​A​f,0,2​​,S​B​f,0,1​​,S​B​f,0,2​​]

The public key M​f​​

Some of the metadata is encrypted. Construct a plaintext md​e​​ as:

The root TLF block (described more below).

The private key m​f​​.

Then, the encryption is computed as: SecretBox(s​f,0​​,md​e​​)

Note that the server sees reader and writer blocks in the clear, and
can therefore sanity check updates to them. For instance, the server
can check that the readers and writers correspond to the TLF
name, and that readers and writers aren't being dropped maliciously
during updates.

Similarly, the last writer is known to the server, since the server should
check the signatures on incoming updates.

Writers can obviously update all of the metadata and data for a TLF.
Readers can change two fields in the folder's metadata: they can push new
keys onto the end of the reader key list; and they can flip the folder's
"rekey" bit to "on." In either case, they sign these changes with any of
their valid signing sibkeys. The server and clients can check that all
updates to metadata are authorized. The server rejects unauthorized
updates, and clients ignore them.

Encrypting a Block

Once she has established keys as above, Alice can start encrypting blocks.
Take any arbitrary plaintext block like b​j​​ in TLF f.
Assume version 0 of the key. To encrypt:

Generate a random 32-byte per-block key s​j​​.

Generate a random nonce n=rand(24)

Compute: B​j​​= SecretBox(s​j​​⊕s​f,0​​,n,b​j​​)

For the initial rollout, store {n,s​j​​} along with B​j​​ on the
hosted storage provider. In the future, we hope to store s​j​​ on machines
we host to get more secure block deletion.

The idea of some sort of per-block key is one we'll want to keep and
expand upon in the future. It allows us to be quite liberal in distributing
encrypted blocks to CDNs and local caches, without fearing that they might
be never thrown away. If we apply more strict access controls on per-block
keys s​j​​, we still retain some ability to throw away blocks, even
after client key compromise.

We're not using convergent encryption or anything like it, so a block that's used twice
will be encrypted differently both times. However, clients will keep per-folder caches that
map hashes of block plaintexts to encrypted block IDs. So they shouldn't reencrypt/reupload
blocks they know to be repeats. In practice, this local lookup might be good enough, since often
data is read before it's copied, and many block copies will hit this cache. We don't
want to allow block deduplication across top-level-directories for privacy reasons.

FS Structure

The files in a TLF form a Merkle Tree as in other secure distributed file systems
(including SFSRO,
SUNDR
and Tahoe-LAFS).
However, KBFS makes the same simplification as the ORI File system,
and does away with iNode-style indirection. Thus, KBFS cannot implement hard-links
without deeper architectural changes.

Block IDs are computed as the SHA-256 hashes of encrypted blocks and their nonces. Directory
nodes are, in turn, maps of file names to Block IDs. They are packed as blocks, and get
the same encryption treatment as data blocks. This structure continues recursively up to
the TLF root.

Signing The Root Block

Readers and writers in a private shared TLF know via authenticated
encryption that someone who had access to s​f,0​​ wrote f,
but they don't know who. Alice can try to steal credit for Bob's work, or
Bob might frame Alice for saying something naughty.

A sledgehammer approach to this problem would be to have Alice sign the
root block of f on every update. But this is too strong of a
statement. If Eve later compromises Charlie's key, she can blackmail
Alice, since Alice proved to Bob, Charlie and everyone else that she wrote
particular contents to f.

We experimented with various pairwise symmetric MAC schemes, to achieve
repudiability as in OTR protocols. However, such schemes didn't work for
folders with non-writing readers. So we do the simple thing which is
just to sign the hash of the root block with the writer's per-device
EdDSA signing key. The same technique is used in all three cases:
public directories, private directories without non-writing readers,
and private directories with non-writing readers.

Public Directories

If user u updates a KBFS folder with his ith device,
he must sign his changes to the whole folder state by signing its root block.
This will in effect sign all data and metadata contained recursively in this folder.

To sign, u computes the complete metadata md
as specified above, except without encrypting md​e​​. He then signs the block:

σ(md)= Sign(e​u​i​​,md)

The Sign function outputs both the signature, and the KID of the
key used to generate it. This way, verifiers know which key to use.

Home Directories

As described above, home directories are special cases of TLF group
directories. When Alice sets up encryption keys for all of her devices,
she uses the protocol described in 4.1.1, iterating all of her per-device
keys. Alice can skip the signing step described in 4.1.3, because she is not
worried about the "confused authorship" attack, but she still needs to
sign md as for public directories to prevent the server from
spoofing rekeys.

Keeping the Server Honest

Unless we're careful, the server can selectively withhold file system updates
or can serve two different versions of the FS to two different clients. In
this section, we describe the precautions we take to prevent these sorts of
attacks.

Our general approach is that the server creates a Merkle Tree out of all
sensitive site data, then signs and publishes this tree for all to see.
Third party monitors are invited to check on the progression of the site's
Merkle trees and to check they obey the specified cryptographic consistency
constraints. Clients check their views of the file system with those
published in the Merkle trees to make sure the server isn't rolling back
state or attempting a fork.

There are three separate Merkle trees to consider: (1) the Keybase Core
Merkle tree that publicizes sibling keys, public identity proofs, and
``follower'' statements;
(2) a public-data Merkle tree that captures all of the public data on KBFS; and
(3) a private-data Merkle tree that encapsulates the state of all private data on KBFS.

The Keybase Core Merkle Tree

As described in our
Server Security document, users sign statements about their keys,
their identities and the identities of others. They commit these
signatures to monitonically growing signature chains, which are collected
into a sitewide Merkle Tree. As we describe later in this document, these
signature chains will eventually cover organizations in addition to single
users. These structure all exist outside of KBFS and should be accessible
without KBFS software.

The Public-Data Merkle Tree

All public directories on KBFS are collected into a sitewide Merkle Tree.
The leaves of this tree are dictionaries of key-value pairs. The key
is the ID of the TLF, and the value is most recent signature of
the TLF RootMetadata block. This signature contains,
recursively: the writer keys for this directory, the root directory
entry for this directory, and the hash of the previous RootMetadata
block.

Interior nodes of the merkle tree are constructed upwards, covering shorter
prefixes of the directory IDs listed in the leaves. The final root is
signed, and published periodically, maybe every hour.

The Private-Data Merkle Tree

Private data deserves the same treatment as public data, but we have to
be slightly more careful not to reveal: (1) how often private directories
are modified; or (2) even worse, who is collaborating with whom. Naive
solutions might allow an adversary to correlate device additions in public
sigchains with changes in private TLFs to correlate share membership.
We take a pretty blunt approach that should solve data leaks from the server
to the public.

When a user Alice creates private TLF f, she generates a
Curve25519 keypair (M​f​​,m​f​​). She stores M​f​​ in the folder's
public metadata, and m​f​​ in the folder's private metadata.

Every hour, the server makes a new Merkle Tree of the entire site's private data.
The algorithm is:

Generate an ephemeral Curve25519 keypair (M​S​​,m​s​​)

For each private folder f:

Perform encryption C​f​​← Box(m​s​​,M​f​​,σ​f​​), where σ​f​​
is the most-recent signature of f's metadata.

Make a mapping of the ID of f to C​f​​.

Construct a Merkle tree from these leaves as in the Public data case, and publish
the tree along with M​s​​, the public ephemeral Curve25519 key.

When Alice's client goes to fetch TLF f, she gets the most recent
root from the server, and also the most recently-published Merkle tree
from the server. She traces a path down the from the root to the leaf for f,
and then decrypts σ​f​​ using m​f​​. She finally checks
that the current root links back to the root published in the merkle tree. She might
of course need to traverse 100s or 1000s of links if there were many writes in the
last hour, but the server should batch all of the data necessary to make this
determination in a few RPC calls. Future implementations might also implement
"skip list"-style fast-forward links to speed this process for periods of heavy
writes.

Merkle Tree Integration

The roots of the two KBFS Merkle trees will be inserted into the root block of
the main Keybase Merkle Tree. Doing so establishes ordering relationships between
key manipulations and writes to KBFS. In turn, this knowledge allows
KBFS clients to determine if writes to KBFS happened before or after a key is revoked,
and therefore whether or not a corrupt server has exploited an exposed key.

Keying, Rekeying, and Revoking

Adding A New Device

Imagine Bob gets a new device. He now wants to provision it for access to all
relevant KBFS folders. Here are the high-level steps he would take:

Key generation

Key certification: sign the new key with an existing sibling key

TLF keying: add the key to the relevant reader and writer blocks

The first step, key generation, was covered above, and is the same here.
Key certification is covered in our key exchange
document.

TLF Keying

Now Bob must go through and add write access for his device on all encrypted
Keybase directories that he can write. Call his new device n and
his old device d. Device d does the following:

Encrypt s​f,0​​ using sender key m​e​​ and receiver key M​B​n​​,
which was generated on device n and securely transferred over to d
in previous steps.

Add S​B​f,0,n​​ to the writers (or readers) list for f and
sign with e​B​d​​

Bob pushes (M​e​​,S​B​f,0,n​​) to TLF f's reader or
writer list. If he is a reader, he is allowed to push onto the reader list
in the metadata. If he is a writer, he can edit f's metadata
at will.

Freezing a Device [Unimplemented]

If Alice loses a device d, the first thing to do is to "freeze" access
to the device via the server. Any device that can establish a session as the user
can do so. Once a device is frozen, the server will refuse to honor requests for
key-halves of the form s​A​f,0,d​​. Any device but d can
ask for an unfreeze.

Alice can take an additional (optional) step. She can sign a key freeze into her
signature chain for each signing key on the lost device. Other clients should refuse
to honor signatures from frozen keys during the freeze window. Alice can later
cancel the freeze using the same key that ordered the freeze. The server
can't assist here, so Alice can only freeze a specific key if she has another provisioned device.

Decommisioning a Device (And Optionally Adding a New One)

To decommision a device, Alice should first freeze it as above, issuing a final revocation
statement for the key into her sigchain. Her client should
then iterate over all TLFs that she has read or write access to, and delete the lost
keys from the keyblocks, and then rekey. She of course needs an active keypair in order to do this.
Let's say r is Alice's lost device, and d is her currently
active device. Also, Alice can optionally provision a new device n as a result
of this process:

Organizations

Other Topics

File Wipe [Unimplemented]

The server maintains a list of which block-specific keys are in use for
each folder. It doesn't know which key corresponds to which files, since
the TLF structure cannot be recreated without decryption keys.

With A Provisioned Device

Say the user wants to delete a file f.
A provisioned device can compute a set I such that for any i∈I,
b​i​​ is or was a part of f. The client then asks
the server to delete s​i​​ for i∈I. The server
honors this request as long as the user can write to the parent TLF.

Without a Provisioned Device

If a user loses all of his provisioned devices, he can ask the server to wipe
a TLF by deleting all keys for blocks in that TLF. The server keeps
a list, so can perform the task. Of course, the server will only honor
such a request if the user can write to this TLF.