Streams come to us from the
earliest days of unix
and have proven themselves over the decades as a dependable way to compose large
systems out of small components that
do one thing well.
In unix, streams are implemented by the shell with | pipes.
In node, the built-in
stream module
is used by the core libraries and can also be used by user-space modules.
Similar to unix, the node stream module's primary composition operator is called
.pipe() and you get a backpressure mechanism for free to throttle writes for
slow consumers.

Streams can help to
separate your concerns
because they restrict the implementation surface area into a consistent
interface that can be
reused.
You can then plug the output of one stream to the input of another and
use libraries that operate abstractly on streams to
institute higher-level flow control.

Streams are an important component of
small-program design
and unix philosophy
but there are many other important abstractions worth considering.
Just remember that technical debt
is the enemy and to seek the best abstractions for the problem at hand.

why you should use streams

I/O in node is asynchronous, so interacting with the disk and network involves
passing callbacks to functions. You might be tempted to write code that serves
up a file from disk like this:

This code works but it's bulky and buffers up the entire data.txt file into
memory for every request before writing the result back to clients. If
data.txt is very large, your program could start eating a lot of memory as it
serves lots of users concurrently, particularly for users on slow connections.

The user experience is poor too because users will need to wait for the whole
file to be buffered into memory on your server before they can start receiving
any contents.

Luckily both of the (req, res) arguments are streams, which means we can write
this in a much better way using fs.createReadStream() instead of
fs.readFile():

Here .pipe() takes care of listening for 'data' and 'end' events from the
fs.createReadStream(). This code is not only cleaner, but now the data.txt
file will be written to clients one chunk at a time immediately as they are
received from the disk.

Using .pipe() has other benefits too, like handling backpressure automatically
so that node won't buffer chunks into memory needlessly when the remote client
is on a really slow or high-latency connection.

Now our file is compressed for browsers that support gzip or deflate! We can
just let oppressor handle all that
content-encoding stuff.

Once you learn the stream api, you can just snap together these streaming
modules like lego bricks or garden hoses instead of having to remember how to push
data through wonky non-streaming custom APIs.

Streams make programming in node simple, elegant, and composable.

basics

There are 5 kinds of streams: readable, writable, transform, duplex, and
"classic".

Running this program we can see that _read() is only called 5 times when we
only request 5 bytes of output:

$ node read2.js | head -c5
abcde
_read() called 5 times

The delay is necessary because the operating system requires some time to send
us the relevant signals to close the pipe.

The process.stdout.on('error', fn) handler is also necessary because the
operating system will send a SIGPIPE to our process when head is no longer
interested in our program's output, which gets emitted as an EPIPE error on
process.stdout.

These extra complications are necessary when interfacing with the external
operating system pipes but are automatic when we interface directly with node
streams the whole time.

--------

In this example the 'data' events have a string payload as the first argument.
Buffers and strings are the most common types of data to stream but it's
sometimes useful to emit other types of objects.

Just make sure that the types you're emitting as data is compatible with the
types that the writable stream you're piping into expects.
Otherwise you can pipe into an intermediary conversion or parsing stream before
piping to your intended destination.

writable

Writable streams are streams that can accept input. To create a writable stream,
set the writable attribute to true and define write(), end(), and
destroy().

This writable stream will count all the bytes from an input stream and print the
result on a clean end(). If the stream is destroyed it will do nothing.

One thing to watch out for is the convention in node to treat end(buf) as a
write(buf) then an end(). If you skip this it could lead to confusion
because people expect end to behave the way it does in core.

backpressure

Backpressure is the mechanism that streams use to make sure that readable
streams don't emit data faster than writable streams can consume data.

Note: the API for handling backpressure is changing substantially in future
versions of node (> 0.8). pause(), resume(), and emit('drain') are
scheduled for demolition. The notice has been on display in the local planning
office for months.

In order to do backpressure correctly readable streams should
implement pause() and resume(). Writable streams return false in
.write() when they want the readable streams piped into them to slow down and
emit 'drain' when they're ready for more data again.

writable stream backpressure

When a writable stream wants a readable stream to slow down it should return
false in its .write() function. This causes the pause() to be called on
each readable stream source.

When the writable stream is ready to start receiving data again, it should emit
the 'drain' event. Emitting 'drain' causes the resume() function to be
called on each readable stream source.

readable stream backpressure

When pause() is called on a readable stream, it means that a downstream
writable stream wants the upstream to slow down. The readable stream that
pause() was called on should stop emitting data but that isn't always
possible.

When the downstream is ready for more data, the readable stream's resume()
function will be called.

pipe

.pipe() is the glue that shuffles data from readable streams into writable
streams and handles backpressure. The pipe api is just:

src.pipe(dst)

for a readable stream src and a writable stream dst. .pipe() returns the
dst so if dst is also a readable stream, you can chain .pipe() calls
together like:

a.pipe(b).pipe(c).pipe(d)

which resembles what you might do in the shell with the | operator:

a | b | c | d

The a.pipe(b).pipe(c).pipe(d) usage is the same as:

a.pipe(b);
b.pipe(c);
c.pipe(d);

The stream implementation in core is just an event emitter with a pipe function.
pipe() is pretty short. You should read
the source code.

terms

These terms are useful for talking about streams.

through

Through streams are simple readable/writable filters that transform input and
produce output.

duplex

Duplex streams are readable/writable and both ends of the stream engage
in a two-way interaction, sending back and forth messages like a telephone. An
rpc exchange is a good example of a duplex stream. Any time you see something
like:

a.pipe(b).pipe(a)

you're probably dealing with a duplex stream.

read more

the future

A big upgrade is planned for the stream api in node 0.9.
The basic apis with .pipe() will be the same, only the internals are going to
be different. The new api will also be backwards compatible with the existing
api documented here for a long time.

You can check the
readable-stream repo to see what
these future streams will look like.

built-in streams

process

This readable stream contains the standard system input stream for your program.

It is paused by default but the first time you refer to it .resume() will be
called implicitly on the
next tick.

If process.stdin is a tty (check with
tty.isatty())
then input events will be line-buffered. You can turn off line-buffering by
calling process.stdin.setRawMode(true) BUT the default handlers for key
combinations such as ^C and ^D will be removed.

meta streams

state streams

scuttlebutt can be used for
peer-to-peer state synchronization with a mesh topology where nodes might only
be connected through intermediaries and there is no node with an authoritative
version of all the data.

The kind of distributed peer-to-peer network that
scuttlebutt provides is especially
useful when nodes on different sides of network barriers need to share and
update the same state. An example of this kind of network might be browser
clients that send messages through an http server to each other and backend
processes that the browsers can't directly connect to. Another use-case might be
systems that span internal networks since IPv4 addresses are scarce.

Note that nodes a and e aren't directly connected, but when we run this
script:

$ node model.js
x => 555 from 1347857300518

the value that node a set finds its way to node e by way of nodes b and
d. Here all the nodes are in the same process but because
scuttlebutt uses a
simple streaming interface, the nodes can be placed on any process or server and
connected with any streaming transport that can handle string data.

Next we can make a more realistic example that connects over the network and
increments a counter variable.

Here's the server which will set the initial count value to 0 and count ++
every 320 milliseconds, printing all updates to count:

These values are due to
scuttlebutt's
history-based conflict resolution algorithm which is hard at work ensuring that the state of the system across all nodes is eventually consistent.

Note that the server in this example is just another node with the same
privledges as the clients connected to it. The terms "client" and "server" here
don't affect how the state synchronization proceeds, just who initiates the
connection. Protocols with this property are often called symmetric protocols.
See dnode for another example of a
symmetric protocol.

The basic idea is that you just put functions in objects and you call them from
the other side of a stream and the functions will be stubbed out on the other
end to do a round-trip back to the side that had the original function in the
first place. The best thing is that when you pass functions to a stubbed
function as arguments, those functions get stubbed out on the other side!

This approach of stubbing function arguments recursively shall henceforth be
known as the "turtles all the way down" gambit. The return values of any of your
functions will be ignored and only enumerable properties on objects will be
sent, json-style.

It's turtles all the way down!

Since dnode works in node or on the browser over any stream it's easy to call
functions defined anywhere and especially useful when paired up with
mux-demux to multiplex an rpc stream
for control alongside some bulk data streams.

test streams

power combos

distributed partition-tolerant chat

The append-only module can give us a
convenient append-only array on top of
scuttlebutt
which makes it really easy to write an eventually-consistent, distributed chat
that can replicate with other nodes and survive network partitions.

TODO: the rest

roll your own socket.io

We can build a socket.io-style event emitter api over streams using some of the
libraries mentioned earlier in this document.

First we can use shoe
to create a new websocket handler server-side and
emit-stream
to turn an event emitter into a stream that emits objects.
The object stream can then be fed into
JSONStream
to serialize the objects and from there the serialized stream can be piped into
the remote browser.

Meanwhile on the browser side of things just parse the json shoe stream and pass
the resulting object stream to eventStream(). eventStream() just returns an
event emitter that emits the server-side events:

Use browserify to build this
browser source code so that you can require() all these nifty modules
browser-side:

$ browserify main.js -o bundle.js

Then drop a <script src="/bundle.js"></script> into some html and open it up
in a browser to see server-side events streamed through to the browser side of
things.

With this streaming approach you can rely more on tiny reusable components that
only need to know how to talk streams. Instead of routing messages through a
global event system socket.io-style, you can focus more on breaking up your
application into tinier units of functionality that can do exactly one thing
well.

For instance you can trivially swap out JSONStream in this example for
stream-serializer
to get a different take on serialization with a different set of tradeoffs.
You could bolt layers over top of shoe to handle
reconnections or heartbeats
using simple streaming interfaces.
You could even add a stream into the chain to use namespaced events with
eventemitter2 instead of the
EventEmitter in core.

If you want some different streams that act in different ways it would likewise
be pretty simple to run the shoe stream in this example through mux-demux to
create separate channels for each different kind of stream that you need.

As the requirements of your system evolve over time, you can swap out each of
these streaming pieces as necessary without as many of the all-or-nothing risks
that more opinionated framework approaches necessarily entail.

html streams for the browser and the server

We can use some streaming modules to reuse the same html rendering logic for the
client and the server! This approach is indexable, SEO-friendly, and gives us
realtime updates.

Our renderer takes lines of json as input and returns html strings as its
output. Text, the universal interface!

We can use brfs to inline the
fs.readFileSync() call for browser code
and hyperglue to update html based on
css selectors. You don't need to use hyperglue necessarily here; anything that
can return a string with html in it will work.

The server will just use slice-file to
keep everything simple. slice-file is
little more than a glorified tail/tail -f api but the interfaces map well to
databases with regular results plus a changes feed like couchdb.