Working with binary data in Node.js

Recently I spent some time working with streams of binary data in Node.js. Here are some lessons I learned as a result.

Binary length

Node.js makes it easy to work with strings because under the hood everything is stored in 16-bit characters, which means it can represent ASCII and UTF-8 (and even UTF-16) characters natively. Consequently:

'Cafe'.length === 'Café'.length;
// true

When working with binary data things get a little bit more complicated:

Buffer.byteLength('Cafe') === Buffer.byteLength('Café');
// false

This is because it takes 1 byte to represent 'e' but 2 bytes to represent 'é'. Therefore, it is important to use the byte length, rather than string length, when encoding your data.

Encoding the data

When working with streams of binary data it is necessary to somehow identify individual messages within the stream.

The method I used was to encode the data length-value format (a simplified version of type-length-value encoding format) that consisted of a fixed-size length field followed by variable-size block of data.

Chunking

You will of course have noticed the warning in the example above. In reality, given a reasonable amount of data, the receiver will get this data in chunks. These chunks are buffers filled with binary data, but there is no guarantee that the data inside contains the entire message, the entire length field, or even the entire character (in the case of a multi-byte character)!

Consider the following example in which our simulated server converts incoming chunks into a string on receipt. This is a technique frequently seen in examples of Node.js TCP servers.