Byte Order - The Endian Problem

Now we have the ability to read any group of bytes in the ArrayBuffer in any of the available formats but we have one last problem. When you store a multibyte data type in a byte array there are two ways of doing it.

Consider for a moment a 16 bit i.e. 2 byte integer. The 16 bit integer is composed of two groups of eight bits - the low order bits and the high order bits. Different systems opt to store these bits either with the high order bits first i.e. Big-endian or with the low order bits first i.e. Little-endian.

To be more precise:

Big-endian order stores low order bytes at lower addresses or lower index positions.

Little-endian order stores low order bytes at higher addresses or higher index positions.

If you find this confusing follow the next example.

Suppose we have a UInt8Array i.e essentially a byte array and we want to use the first two bytes to store an 16-bit integer i.e. a Uint16 say. if you want to store the value 255 in the 16-bit integer then all you need to know is its binary representation in 16-bits i.e.:

0000000011111111

Now to work with this as two bytes it has to be split into two groups of 8-bits:

00000000 11111111

The zeros to the left are the high order byte and the ones to the right are the low order byte. In hex these are 0x0 and 0xFF. Now lets store these in the byte array:

var bytes=new Uint8Array(10); bytes[0]=0xFF; bytes[1]=0x00;

You may already have noticed that we have exercised a choice without really thinking about it.

Why should we store the low byte in bytes[0] and not in bytes[1]?

If you now try to use the two bytes as if it was a 16-bit integer then you are in for a surprise:

Notice all we have done is to associate a DataView with the DataArray and read the two bytes as if they were an unsigned 16-bit integer. The result isn't 255 but 65280.

The reason is of course that the default for the DataView object is to work with Big-endian data. That is it assumes that the high order byte is stored first i.e at the lower array index.

There is an optional parameter that you can specify for all of the get/set methods to say if the byte order is Big-endian or Little-endian. If you set the parameter to false or leave it out then you get Big-endian. If you se it to true you get Little-endian.

So to make our example work we could set the endian parameter to true as in:

var uint=datav.getUint16(0,true);

and following this you will see 256 as the value of the uint.

Of course the alternative would be to store the bytes in Big-endian order:

bytes[0]=0x00;bytes[1]=0xFF;

and then the standard DataView methods work by default.

Of course in "real life" you generally don't get to choose the byte order. When you read a file or handle a stream of data then generally it is the outside work that has already fixed the byte order in use.

In practice you will find that processors such as the Intel 86x range use Little-endian order. However most network protocols use Big-endian order and this means that the most significant bytes are sent and recieved first. You also need to find out what endian convention is used for files - and it can differ according to the format.

One last problem. What order do typed arrays use?

There is no parameter optional or otherwise that allows you to set the byte order in a multibyte typed array.

What the standard says is that such arrays are to use the natural order for the hardware - which means on most systems they will use Little-endian.

If you are running the examples on an x86 machine then you can check this using:

This simply stores 255 into the low byte of the Uint8 array and then reads it back in the default order used by Uint16. If the result is 255 then the machine is using Little-endian order.

Notice that in many cases you don't need to deal with the byte order problem because the machine will produce data in its own default byte order and the typed arrays work with that byte order. However for data that is read into a program, either as a file or as a download you do have to worry about byte order and the only way to deal with it is to use a DataView object.

Unpacking the data

The standard technique is to use a DataView to unpack any values that form a structure into a suitable JavaScript object and any array data into a JavaScript typed array or an Array object.

For example, if the ArrayBuffer contains a byte count and a 16 bit size value in Big-endian order you might use something like:

From this point on you can forget endian issues and just use header and its properties.

If you have a Big-endian array of two byte integers and want to work with it on a Little-endian machine you can use the equivalent idea for an array:

for (var i = 0; i < len; i++) { data[i]=datav.getUint16(i*2);}

Notice that data could be an Array object or a typed array it all depends on what you are going to do with the data. Also notice the need to step though the ArrayBuffer in units of the data type being read. That is we are reading two byte integers so the offset is i*2.

Conclusion

We have now the ability to work with a binary buffer containing an array or a structure. Because you can assign multiple views to a single ArrayBuffer we can even process formats that have a header which is a structure followed by the data as an array.

If you would like to see this in action see the next installment - a example project that read in a BMP format file and displays it on a Canvas object.

What is the fuss about strong typing really all about? JavaScript doesn't make much use of type so what is it missing? What is more difficult to do in JavaScript than in a typed language? Are there th [ ... ]

The prototype is about the most mysterious part of JavaScript. Once you have mastered the call context and the constructor, it is the prototype that you have to turn to. How does it work? How do you u [ ... ]