FoundationDB is a distributed
key-value store. From the database perspective, both keys and values
are just bytes. Keys are sorted lexicographically and different
chunks are assigned to different servers. Because of this locality
property, the range read
operation is efficient as most of the time the client would be able to
fetch the data from a single server.

As the keys are just bytes, the application must serialize the keys
and more importantly if we need to use range operation, then the
serialization should preserve the order of the keys. Fortunately, all
the clients implement the tuple
layer
which has this property. This post attempts to explain how various
data types are encoded in a way that preserves their natural sort order.

I will represent bytes using their hex format for the rest of the blog
post. Each data type is tagged with an identifier which uniquely
identifies the type.

Null

Null is encoded with the tag 00.

Byte String

Tag 01 is used for byte string. 00 is used as
the terminator and 00 inside the byte string are escaped by replacing them
with 00FF. For example the byte string ABCDEF would be encoded as 01ABCDEF00.

It’s clear that order would be preserved for byte strings without 00
because the same constant values are prefixed and suffixed for all the
byte strings. Let’s consider all the possible cases of byte strings
with 00.

AB00DD

01AB00FFDD00

AB01DD

01AB01DD00

AB00BC

01AB00FFBC00

AB00DD

01AB00FFDD00

AB00DD

01AB00FFDD00

AB01

01AB0100

AB00

01AB00FF00

AB01DD

01AB01DD00

AB

01AB00

AB00

01AB00FF00

Because the terminator 00 has the lowest byte value and if
the compared byte string has any value other than 00 in the same
position then there is no need to consider the rest of the bytes. The
last case shown is a bit tricky and there is a reason for choosing FF as
the escape value and it will become more apparent when we consider Tuple types.

Integer

Integers upto 8 bytes are supported. Integers get encoded into
variable size which depends on minium bytes required to represent
the integer. Tags 0C to 13 are used for negative numbers and tag 14 for
zero and tags 15 to 1C are used for positive number.

For positive integers, big-endian byte
order is used
which naturally preserves the sort order. For negative integers, the
sign is removed and bit complement of big-endian byte order is used
bit-complement(big-endian(abs(n))). The bit complement operation
reverses the sort order. If two integers get encoded into different
sized bytes, then the order would be preserved by tag values. The tag
value increases as the integer value increases.

-98344948949494949

0CFEA29BCA3C69535A

-303040404040

0FB9716265B7

-20404

12B04B

-42

13D5

42

152A

20404

164FB4

303040404040

19468E9D9A48

98344948949494949

1C015D6435C396ACA5

Float

Tags 20 (32 bits) and 21 (64 bits) are used for floating point numbers.

The value is first encoded in IEEE 754 binary format. For positive
numbers just the sign bit should be flipped and for negative numbers
all the bits should be flipped. This should provide total
order
as per the IEEE specification.

Tuple

Because of the way other values are encoded, a tuple can be encoded
just by concatenating the encoded values. Note that the sort order is
preserved only between values of same types. Specially comparison
between types like Float and Integer won’t work.

(AB, 42)

01AB00152A

(AB00, 42)

01AB00FF00152A

This example illustrates why FF is chosen as the
escape character. As long as we don’t use FF as the tag for
any type, the byte string value 00 and the terminator
value 00 would not conflict and break the sort
order. Other types don’t introduce these kinds of behaviors because
they are either of fixed size or in case of Integer, the tag will
determine the order of the values if they are of different size.

Nested Tuple

Tuple type can’t be nested as there is no difference between (1, (2,
3)) and (1, 2, (3)). Both values would get encoded as the same as
the flattened tuple (1, 2, 3). Nested tuple, as the name implies
supports arbitrary nesting. [ is used to represent nested tuple.

Tag 05 and terminator 00 are used for nested
tuple. Encoded values are concatenated and any Null values are escaped
as 00FF.

[1, [2, 3]]

05150105150215030000

[1, 2, [3]]

05150115020515030000

There are other data types like unicode string, arbitrary precision
number, UUID, boolean etc, but all of them follow similar principles
discussed so far. As such, they are not considered here.