Revision Content

creating a C-like interface for strings (i.e., array of characters codes — anArrayBufferView in JavaScript) based upon the JavaScript ArrayBuffer interface

creating an highly extensible library that anyone can extend by adding methods to the object StringView.prototype

creating a collection of methods for such string-like objects (since now: stringViews) which work strictly on arrays of numbers rather than on creating new immutable JavaScript strings

working with other Unicode encodings different from default JavaScript's UTF-16 {{domxref("DOMString")}}s

Introduction

As web applications become more and more powerful, adding features such as audio and video manipulation, access to raw data using WebSockets, and so forth, it has become clear that there are times when it would be helpful for JavaScript code to be able to quickly and easily manipulate raw binary data. In the past, this had to be simulated by treating the raw data as a string and using the charCodeAt() method to read the bytes from the data buffer.

However, this is slow and error-prone, due to the need for multiple conversions (especially if the binary data is not actually byte-format data, but, for example, 32-bit integers or floats).

JavaScript typed arrays provide a mechanism for accessing raw binary data much more efficiently. The StringView constructor is one level above typed arrays.

A number expressing in codepoints the length of the new stringView if the input argument is a string or a stringView, or in raw elements if the input is a typed array, an arrayBuffer or any other kind of ordered object (like Array, collections, etc.). If not specified it will take the length of the input. It never can be major than the length of the input. If you want to see how create a stringView bigger than its content, please, see this note.

StringView instances' properties

The buffer to be shared between stringView.rawData and stringView.bufferView view references.

rawData

An arrayBufferView containing the representation of the string as array of 8-bit, 16-bit, or 32-bit integers (depending on the chosen encoding).

bufferView

An arrayBufferView containing the representation of the whole buffer as array of 8-bit, 16-bit, or 32-bit integers (depending on the chosen encoding).

StringView instances' methods

makeIndex()

Syntax

stringView.makeIndex([charactersLength[, startFrom]])

Description

If the charactersLength argument is a number it will be taken as codepoints length and makeIndex() will return the index in elements of that position starting from 0. If the startFrom argument is passed the analysis will be done starting from it. If the charactersLength argument is omitted, makeIndex() will return the length in codepoints (ASCII or UTF-encoded) of the stringView object.

Arguments

charactersLength(optional)

A number expressing the distance in codepoints from startFrom of the index of stringView.rawData to be returned.

startFrom(optional)

A number expressing the position in raw elements of the characters parts to skip. If omitted it will be considered as 0.

Performance note: Each invocation of stringView.makeIndex() runs a cycle for all characters contained in the stringView object between startFrom and startFrom + charactersLength. Dont't use stringView.makeIndex() in a cycle as if it were a normal length property. For custom cycles, look at the example proposed here.

Returns a new stringView object which will share the same buffer. Arguments characterOffset and charactersLength will be treated as in String.prototype.substr(characterOffset[, charactersLength) (see). If you want to create a new stringView object cloning without sharing the same buffer, look at this table.

Arguments

characterOffset(optional)

A number expressing (in codepoints) the location at which to begin extracting characters.

charactersLength(optional)

A number expressing (in codepoints) the location at which to stop extracting characters.

As was explained above, characterOffset is a character index. The index of the first character is 0, and the index of the last character is 1 less than the length of the stringView.subview begins extracting characters at characterOffset and collects charactersLength characters (unless it reaches the end of the string first, in which case it will return fewer).

If characterOffset is positive and is greater than or equal to the length of the string, substr returns an empty string.

If characterOffset is negative, substr uses it as a character index from the end of the string. If characterOffset is negative and abs(start) is larger than the length of the string, substr uses 0 as the start index.

If charactersLength is 0 or negative, substr returns an empty string. If charactersLength is omitted, substr extracts characters to the end of the string.

It will look something like this: callback.call(thisObject, charCode, characterOffset, rawOffset, rawDataArray). If the encoding is a fixed-length Unicode encoding, characterOffset and rawOffset will be the same number.

Note:stringView.forEachChar() executes a complete cycle through all characters in the stringView between characterOffset and characterOffset + charactersLength. If you want to build a custom cycle through a variable-length-encoded stringView (UTF-8, UTF-16), you can use a code like the following, which does not make use of stringView.forEachChar(). If the encoding is a fixed-length one (ASCII, UTF-32, etc.), you can do a normal cycle upon the stringView.rawData array.

JavaScript calls the stringView.valueOf() method to convert an object to a primitive value. You rarely need to invoke the stringView.valueOf() method yourself; JavaScript automatically invokes it when encountering an object where a primitive value is expected.

StringView.base64ToBytes() is a generic utility useful also for binary data. If you want to pass the StringView.base64ToBytes(base64String[, regSize]).buffer property to an ArrayBufferView subclass different from Uint8Array, you should make use of the regSize argument.

loadUTF8CharCode()

Syntax

StringView.loadUTF8CharCode(typedArray, index)

Description

Returns the single codepoint at the given location from an array of UTF-8-encoded elements. An UTF-8-encoded codepoint can occupy up to six elements. This function will recompose all these parts into a codepoint.

var myStringView = new StringView("Hello world!"); // an UTF-8 stringView...
alert(StringView.loadUTF8CharCode(myStringView.rawData, 6)); // 119, which is the character code for "w"

StringView.loadUTF8CharCode() is mainly for internal use and generally is of little utility.

putUTF8CharCode()

Syntax

StringView.putUTF8CharCode(typedArray, charCode, index)

Description

Write a single codepoint at the given position into a typed array. A single UTF-8-encoded codepoint can occupy many elements (up to six). This function will split it into the needed parts and will write them. Returns undefined.

StringView.getUTF8CharLength() is mainly for internal use and generally is of little utility.

loadUTF16CharCode()

Syntax

StringView.loadUTF16CharCode(typedArray, index)

Description

Returns the single codepoint at the given location from an array of UTF-16-encoded elements. An UTF-16 codepoint can occupy up to two UTF-16-encoded elements. This function will recompose all these parts into a codepoint.

var myStringView = new StringView("Hello world!", "UTF-16"); // an UTF-16 stringView...
alert(StringView.loadUTF16CharCode(myStringView.rawData, 6)); // 119, which is the character code of "w"

StringView.loadUTF16CharCode() is mainly for internal use and generally is of little utility.

putUTF16CharCode()

Syntax

StringView.putUTF16CharCode(typedArray, charCode, index)

Description

Write a single codepoint at the given position into a typed array. A single UTF-16-encoded codepoint can occupy up to two UTF-16-encoded elements. This function will split it into the needed parts and will write them. Returns undefined.

Glossary

An unique number for each Unicode character. It is rappresented by a collection of 1-6 uint8elements for UTF-8, 1-2 uint16elements for UTF-16, 1 uint32element for UCS4, 1 uint8element for ASCII, or something else.

Notes

When you include the script stringview.js into a page, no other variables than StringView itself will be added to the global scope.

StringView is an highly extensible library, that anyone can extend by adding methods to the object StringView.prototype.
For example, imagine you want to create a method similar to string.replace(), but for stringView objects. Maybe you should want to solve a situation like the following:

As you can see, the previous example needs you to create two new algorithms: the CLikeRegExp() constructor – a constructor of C-like regular expression objects – and StringView.prototype.replace() – the new method, able to act on stringView instances. Well, just include stringview.js to your scope and work on them in another script:

StringView is a constructor and a collection of methods whose aim is to work strictly on arrays of numbers rather than on creating new immutable JavaScript strings. Keep it in mind when you try to extend its prototype.

Since stringView, unlike C strings, has a length property, there are no reasons to add a NULL codepoint ('\0') after the termination of a string.

StringView has been proposed as strawman for ES6 on ECMAScript Bugs. Everyone can partecipate in the discussion at bug 1557 or at esdiscuss.