Working with Binary Data using Typed Arrays

With HTML5 comes many APIs that push the envelope on user experiences involving media and real-time communications. These features often rely on binary file formats, like MP3 audio, PNG images, or MP4 video. The use of binary file formats is important to these features to reduce bandwidth requirements, deliver expected performance, and interoperate with existing file formats. But until recently, Web developers haven’t had direct access to the contents of these binary files or any other custom binary files.

This post explores how Web developers can break through the binary barrier using the JavaScript Typed Arrays API, and explore its use in the Binary File Inspector Test Drive demo.

Typed Arrays, available in IE10 Platform Preview 4, enable Web applications to use a broad range of binary file formats and directly manipulate the binary contents of files already supported by the browser. Support for Typed Arrays has been added throughout IE10: in JavaScript, in XMLHttpRequest, in the File API, and in the Stream API.

Binary File Inspector

The Binary File Inspector test drive demo highlights some of the new capabilities offered by this combination of new features. You can see the ID3 headers for music files, get a sense of the raw bytes in video files, and also see how additional file formats, like the PCX image file format, can be supported in the browser with the use of JavaScript and Canvas.

In the example above, an .mp4 video is rendered using a <video> element on the left, and the binary contents of the file are displayed on the right, both in HEX form, and as corresponding ASCII characters. In this example, you can see some characteristic elements of the MPEG file format, such as the “ftyp” of “mp4.”

Typed Arrays and ArrayBuffers

Typed Arrays provide a means to look at raw binary contents of data through a particular typed view. For example, if we want to look at our raw binary data a byte at a time, we can use a Uint8Array (Uint8 describes an 8-bit unsigned integer value, commonly known as a byte). If we want to read the raw data as an array of floating point numbers, we can use a Float32Array (Float32 describes a 32-bit IEE754 floating point value, commonly known as a floating point number). The following types are supported:

Array Type

Element size and description

Int8Array

8-bit signed integer

Uint8Array

8-bit unsigned integer

Int16Array

16-bit signed integer

Uint16Array

16-bit unsigned integer

Int32Array

32-bit signed integer

Uint32Array

32-bit unsigned integer

Float32Array

32-bit IEEE754 floating point number

Float64Array

64-bit IEEE754 floating point number

Each array type is a view over an ArrayBuffer. The ArrayBuffer is a reference to the raw binary data, but it does not provide any direct way to interact with the data. Creating a TypedArray view of the ArrayBuffer provides access to read from and write to the binary contents.

The example below creates a new ArrayBuffer from scratch and interprets its contents in a few different ways:

// Create an 8 byte buffer

var buffer = new ArrayBuffer(8);

// View as an array of Uint8s and put 0x05 in each byte

var uint8s = new Uint8Array(buffer);

for (var i = 0; i < 8; i++) {

uint8s[i] = 5; // fill each byte with 0x05

}

// Inspect the resulting array

uint8s[0] === 5; // true - each byte has value 5

uint8s.byteLength === 8; // true - there are 8 Uint8s

// View the same buffer as an array of Uint32s

var uint32s = new Uint32Array(buffer);

// The same raw bytes are now interpreted differently

uint32s[0] === 84215045 // true - 0x05050505 == 84215045

In this way, Typed Arrays can be used for tasks such as creating floating point values from their byte-level components or for building data structures that require a very specific layout of data for efficiency or interoperation.

Typed Arrays for Reading Binary File Formats

An important new scenario enabled by Typed Arrays is to read and render the contents of custom binary file formats that are not natively supported by the browser. As well as the various array types introduced above, Typed Arrays also provide a DataView object that can be used to read and write the contents of an ArrayBuffer in an unstructured way. This is well suited to reading new file formats, which are typically made up of heterogeneous mixes of data.

The Binary File Inspector demo uses DataView to read the PCX file format and render it using a <canvas> element. Here’s a slightly simplified version of what the demo does to read the file header, which includes information like the width, height, DPI, and bits-per-pixel of color depth.

var buffer = getPCXFileContents();

var reader = new DataView(buffer);

// Read the header of the PCX file

var header = {}

// The first section is single bytes

header.manufacturer = reader.getUint8(0);

header.version = reader.getUint8(1);

header.encoding = reader.getUint8(2);

header.bitsPerPixel = reader.getUint8(3);

// The next section is Int16 values, each in little-endian

header.xmin = reader.getInt16(4, true);

header.ymin = reader.getInt16(6, true);

header.xmax = reader.getInt16(8, true);

header.ymax = reader.getInt16(10, true);

header.hdpi = reader.getInt16(12, true);

header.vdpi = reader.getInt16(14, true);

Code similar to the above can be used to add support for rendering a broad range of new data formats in the browser including examples like custom image formats, additional video file formats or domain-specific map data formats.

Getting Binary Data with XHR and File API

Before we can use the Typed Arrays APIs to work with the contents of files, we need to use browser APIs to get access to the raw data. For accessing files from the server, the XMLHttpRequest API has been extended with support for various “responseType”s. The “arraybuffer” responseType provides the contents of the requested server resource to JavaScript as an ArrayBuffer object. Also supported are the “blob,” “text” and “document” response types.

function getServerFileToArrayBufffer(url, successCallback) {

// Create an XHR object

var xhr = new XMLHttpRequest();

xhr.onreadystatechange = function () {

if (xhr.readyState == xhr.DONE) {

if (xhr.status == 200 && xhr.response) {

// The 'response' property returns an ArrayBuffer

successCallback(xhr.response);

} else {

alert("Failed to download:" + xhr.status + " " + xhr.statusText);

}

}

}

// Open the request for the provided url

xhr.open("GET", url, true);

// Set the responseType to 'arraybuffer' for ArrayBuffer response

xhr.responseType = "arraybuffer";

xhr.send();

}

In many cases files are provided by the user, for example as an attachment to an email in a Web mail application. The File API offers Web developers tools to read the contents of files provided via an <input> element, drag-and-drop or any other source that provides Blobs or Files. The FileReader object is used to read the contents of a file into an ArrayBuffer and, like the XHR object, is asynchronous to ensure that reading from the disk does not prevent the user interface from responding.

function readFileToArrayBuffer(file, successCallback) {

// Create a FileReader

var reader = new FileReader();

// Register for 'load' and 'error' events

reader.onload = function () {

// The 'result' property returns an ArrayBuffer for readAsArrayBuffer

var buffer = reader.result;

successCallback(buffer);

}

reader.onerror = function (evt) {

// The error code indicates the reason for failure

if (evt.target.error.code == evt.target.error.NOT_READABLE_ERR) {

alert("Failed to read file: " + file.name);

}

}

// Begin a read of the file contents into an ArrayBuffer

reader.readAsArrayBuffer(file);

}

Conclusion

Binary data is heavily used by Web browsers. With support for Typed Arrays, XHR2 and the File API in IE10, Web applications can now work directly with binary data, to manipulate byte-level data, to render additional binary data formats, and to extract data from existing media file formats. Try out the Binary File Inspector test drive demo, and take Typed Arrays for a spin in IE10.