Parsing data coming off the wire in an event-driven environment can be a
difficult proposition, with naive implementations buffering all received data
in memory until a message has been received in its entirety. Not only is this
inefficient from a memory standpoint, but it may not be possible to determine
the that a message has been fully received without attempting to parse it.
This requires a parser that can gracefully handle incomplete messages and
pick up where it left off. To make this task easier, node-strtok provides

Tokenizing primitives for common network datatypes (e.g. signed and
unsigned integers in variois endian-nesses).

A callback-driven approach well suited to an asynchronous environment (e.g.
to allow the application to asynchronously ask another party for
information about what the next type should be)

An easily extensible type system for adding support for new,
application-defined types to the core.

Very good performance; the built-in MsgPack parser performs within 30% of
the native C++ implementation.

The node-strtok library has only one method: parse(). This method takes a
net.Stream (really any EventEmitter that pumps out data events) and a
callback, which is invoked when a complete token has been read from the stream.
The callback takes a single argument: the token just read from the stream, and
is expected to return the type of token to read from the stream next (e.g.
strtok.UINT32_BE). It is this callback that ultimately implements the
application protocol, consuming the provided tokens and instructing
node-strtok the type of the next token to read. It is this inverted control
flow that allows node-strtok to function efficiently as an interruptable
parser well suited to NodeJS.

The meat of implementing an application protocol is to be found in what it does
in its callback function.

When parse() is invoked, it immediately invokes the callback with a value of
. In this way, the application can indicate the type of token to
read from the stream first. For example, the MsgPack implementation always
reads a strtok.UINT8 when it doesn't know what's coming next; it interprets
the resulting value as a type (e.g. integer, array, etc) to determine what type
of value is coming next.

Implicit in this model is that the callback itself must maintain all
protocol-specific state on its own. Often this will include what type of
structure it's attmpting to parse from the stream, and the progress made so far
in parsing that structure.

node-strtok supports a wide variety of numerical tokens out of the box:

UINT8

UINT16_BE, UINT16_LE

UINT32_BE, UINT32_LE

INT8

INT16_BE

INT32_BE

One might notice that there is no support for 64-bit tokens, since JavaScript
seems to limit value size to less than 2^64. Rather than wrapping up an
additional math library to handle this, I wanted to stick with JavaScript
primitives. Maybe this will change later if this becomes important.

There are a handful of "special" tokens which have special meaning when
returned from the protocol callback: DONE and DEFER.

The DONE token indicates to strtok.parse() that the protocol parsing loop
has come to an end, and that no more data need be read off of the stream. This
causes strtok.parse() to disengage from the stream. The callback will not be
invoked again. In addition, upon receiving DONE, strtok.parse() may have
excess data buffers which it has pulled off of the stream, but which it did not
consume while being directed by the protocol callback. Rather than dropping
this data on the floor, strtok.parse() will synthesize and emit data events
on the stream which it was operating on. The idea is to facilitate protocol
stacks which want to use node-strtok to parse some portion of the protocol
and then hand off processing to some other codepath(s). In this case, it is
expected that the protocol callback will add data event listeners to the
stream immediately before returning DONE.

The DEFER token indicates that the protocol doesn't know what type of token
to read from the stream next. Perhaps the protocol needs to consult some
out-of-process datastructure, or wait for some other event to occur. To support
this case, the protocol callback is actually invoked with 2 arguments: the
value and a defer callback. It is this second parameter, a callback, that must
be invoked with the desired token type once the protocol layer has figured this
out. Note that between the time DEFER is returned and the callback is
invoked, strtok.parse() is buffering all data received from the stream.

The token types returned from the protocol callback are simply objects with

a get() method that takes a Buffer and an offset and returns the token
value, and 2) a len field that indicates the number of bytes to consume.
Any JavaScript object that meets this criteria can be returned from the
protocol callback. node-strtok ships with two built-in types for supporting
common behavior that take advantage of this

BufferType -- consume a fixed number of bytes from the stream and
return a Buffer instance containing these bytes.

StringType -- consume a fixed number of bytes from the stream and
return a string with a specified encoding.

Implementing such types is extremely simple. The BufferType implementation
is given below:

When the callback is first invoked, we aren't in the midst of reading a
message, so we ask for a UINT32_BE to get the length of the subsequent
string. At this point, on every invocation of our callback, we use our
internal numBytes variable to determine if we're reading a number-of-bytes
value or a string value. In the former case, we're called with the
number-of-bytes value and return a StringType instance that knows to read
the specified number of bytes from the stream. In the latter case, we get the
string itself, log it, and then ask for a length again. Lather, rinse,
repeat.