As we saw in the previous chapter, HyperDex offers support for the basic get, put, and search operations on
strings and integers. In this chapter, we explore HyperDex’s richer datatypes and atomic operations on lists, sets,
and maps. We will see how providing efficient atomic operations on these rich, native datastructures greatly
simplifies design for applications with complicated data layout requirements.

By the end of this chapter you’ll be familiar with all datatypes provided by HyperDex, and a number of atomic
operations.

This brings up three different daemons ready to serve in the HyperDex cluster. Finally, we create a space which
makes use of all three systems in the cluster. In this example, let’s create a space that may be suitable for storing
profiles in a social network:

The basic datatype in HyperDex is a byte string. If you don’t specify the type of an attribute when creating a space,
it is automatically treated as an 8-bit bytestring. This means that you’ll have to encode and decode unicode strings
as appropriate. For example, if John wanted to add an accent to his name on his social network page, the code could
look like:

>>> c.put(’profiles’,’jsmith1’, {’first’:u’Jhn’.encode(’utf8’)})True

This encodes the string to raw bytes using UTF-8. When fetching his profile it is necessary to decode the
UTF-8:

As we’ve already seen, HyperDex supports get and put operations on integers. In addition to these basic operations,
HyperDex provides atomic operations to manipulate integers using basic math operations. This is useful when
implementing features such as page-view counters. Let’s add support for tracking the profile views of a page by
incrementing the counter:

Note that this change required just one request to HyperDex. The server atomically examines the current value,
and changes it by the amount specified. In this case, the profile_views attribute is incremented by
one.

Let’s add support for friend requests using HyperDex lists the basis of the feature. For this we’ll use the
pending_requests attribute in the profiles space.

Imagine that shortly after joining, John Smith receives a friend request from his friend Brian Jones. Behind the
scenes, this could be implemented with a simple list operation, pushing the friend request onto John’s
pending_requests:

If John Smith decides that his life’s dream is to just write code, he may decide to join a group on the social
network filled with like-minded individuals. We can use HyperDex’s intersect primitive to narrow down his
interests:

Notice how John’s hobbies become the intersection of his previous hobbies and the ones named in the
operation.

Overall, HyperDex supports simple set assignment (using the put interface), adding and removing elements with
set_add and set_remove, taking the union of a set with set_union and storing the intersection of a set with
set_intersect.

Lastly, our social networking system needs a means for allowing users to exchange messages. Let’s demonstrate how
we can accomplish this with the unread_messages attribute. In this contrived example, we’re going to use an object
attribute as a map (aka dictionary) to map from a user name to a string that contains the message from that user,
as follows:

This shows that any map operation can operate atomically on a group of map attributes at the same time. This
is fully transactional; all such operations will be ordered in exactly the same way on all replicas, and there is no
opportunity for divergence, even through failures.

HyperDex has a special set of data types for storing timestamps. While one could store timestamps as a simple
integer, perhaps seconds since the unix epoch, such a strategy would not work well if the timestamps were close
relative to the range of an integer. HyperDex’s timestamp types use a special hash function designed for timestamps
to spread hash values evenly throughout the hash space, while retaining a customizable degree of locality in the
data.

In this space, the key-space will be broken into one second intervals, and timestamps that fall within the same
interval will be hashed to nearby to each other, while timestamps that map to different intervals will be hashed
further apart in the key space.

The HyperDex timestamp type appears to our application as a normal Python datetime.datetime object, and
will be converted to and from this type by the Python API. Here’s some example events that we can insert into our
space:

In this sample data, our first two events happen exactly 1ms apart, but in the same second, so
they will hash very closely in the key space—likely to the exact same server. The other events will be
far apart from the first, and will likely reside on different servers in a cluster with more than a few
servers.

Once our data is loaded, we can easily search using standard HyperDex comparison predicates like
these:

When we created our space, we declared that we wanted the timestamp to be of type timestamp(second).
HyperDex supports six different timestamp types that spread the data across servers in different ways. The six types
are:

timestamp(second): Intervals are one second in size.

timestamp(minute): Intervals are one minute in size.

timestamp(hour): Intervals are one hour in size.

timestamp(day): Intervals are one day in size.

timestamp(week): Intervals are 7 days in size.

timestamp(month): Intervals are 28 days in size.

Each of these types will roughly map timestamps that fall within the same time interval (e.g., second, month,
etc.) nearby to each other in the key space. When deciding which timestamp type to use for our data, we should
consider the data we’ll store, and the queries we’ll perform. In general, we wish to pick a timestamp
type large enough such that range queries will fall within one unit of the interval. For example, if
we’ll be searching for objects that fall within a 24-hour window, it doesn’t make sense to pick the
second, minute, or hour types because timestamps from throughout the 24-hour window will be spread
throughout the cluster. The day, week, and month timestamps make much more sense for this query
workload.

Picking too large of a timestamp interval, however, can cause problems when inserting timestamps in real time.
For example, if we were to always pick the timestamp(month) type, and insert one event every millisecond,
the approximately 2.5 billion events inserted would map to the same server, and all writes would be
directed to a different server each 28-day interval. If, instead, we picked the timestamp(second) type,
writes would be directed to a different server each second, evenly consuming disk space across the
cluster.