Lisp in modern times

Menu

Bits of things

Most programming languages provide I/O facilities over streams of bytes. For instance, the Java abstract classes java.io.InputStream and java.io.OutputStream declare interfaces for performing input and output on byte streams. These interfaces are implemented by classes dedicated for performing I/O on files, pipes and in-memory buffers. Sometimes we also have to deal with streams of bits. Consider the frames in an MP3 file where the frame header is encoded as bit-fields of varying lengths. Another example is that of a TCP header, the layout of which is illustrated below:

Some fields in the TCP header are encoded using two or more bytes. For example, the Source and destination ports are 16 bits each (2 bytes). Sequence and acknowledgement numbers are 32 bit values (4 bytes). Normal byte streams should be sufficient for reading and writing these values. But there are also fields that do not fall on proper byte offsets. For instance, the data offset field is 5 bits and the reserved field is 6 bits. The six flags that follow are 1 bit each. As most I/O libraries treat bytes as the fundamental unit of information, special bit-twiddling code is required to encode and decode a TCP header. Writing such code can be difficult and error-prone. When we have to pack information in a space-efficient way, an abstraction that can perform I/O on streams of bits starts to look attractive!

In this blog post we develop such an abstraction for the JVM. In this process, we will learn how to mix high-performance Java code with Clojure. We will also see how the expressiveness of Clojure can enhance the usability of lower-level abstractions.

We will start with a simple and efficient library that allows us to read and write bits over an underlying stream. This underlying stream must be an implementation of java.io.InputStream or java.io.OutputStream. As objects of these classes can do I/O only on bytes, the bit stream library has to maintain some local state. At the bare minimum it will require a byte to pack together the bits seen so far. An integer counter is needed to keep track of the current bit position. As the library does I/O, fast and frequent updates to both these state variables become inevitable. So we will implement the lower-level I/O code as two Java classes – bits.BitsReader and bits.BitsWriter.

Invoking these classes from Clojure is quite easy. We just need to update the leinproject.clj file with the path to the Java package. In the example project, I put the `bits` package in the `src/java` folder. This path should be specified in the :java-source-paths property of the project file as shown below:

Now we’re all set to move on to the interesting part. Let’s design an abstraction that will allow us to program bit-streams from a much higher-level. This new abstraction should enable us to specify bit-encoded data as first-class objects in Clojure. We will also write new bit-stream functions that can read and write data based on these specifications. We won’t be calling I/O functions directly on bit-streams any more.

To understand our goal better, let us think about a simple object that can be bit-encoded. The example I have chosen is that of 16 bit colors, where the red and blue components are encoded using 5 bits and the green component is encoded in 6 bits. We can specify the structure of 16 bit colors as a vector of field names and their bit-lengths:

(def _16bit-color-spec [:red 5 :green 6 :blue 5])

Now we want to be able to encode three integer values into a single 16 bit color and decode a single 16 bit color value into red, green and blue components. The higher-level bit-stream functions that we are going to implement should enable us to do this: