README.md

Straw

Straw lets you run a Topology of worker Nodes that consume, process,
generate and emit messages.

Use Cases

Use it anywhere you need data processed in real-time.

Straw is ideal for building flux style reactive webapps.

You create processing nodes that pass messages to each other or the
outside world.

Straw's approach makes it easy to break your problem down in to small
steps and develop iteratively.

Each step in the flow is a separate unix process that Straw manages
for you, automatically making use of multiple cores, and simplifying
spreading the load across multiple machines.

ASX Energy uses Straw to consume live
market data via a FIX feed from the exchange, deal with each different
type of message from the market, route them to historical storage,
implement a delayed feed from the live one, and stream messages to web
clients in real-time over socket.io.

Resources

Mailing List

Introduction

Each Node is run in it's own process. Messages are passed in and out
of Nodes as JSON.

A simple Topology might look like this

ping --> count --> print

Nodes can have multiple inputs and outputs. Messages can be passed out
default output or any number of arbitrarily named outputs.

Messages are queued between Nodes, with each Node processing one
message at a time.

Redis is used for message passing but Nodes are shielded from
implementation. All you need to write is the processing code, extend a
handler for receiving messages and call a method to output a message.

There is nothing preventing a node receiving or sending outside the
Topology, e.g. write to a database, fetch or listen for network data.

A library method is provided to inject or receive messages from
outside the Topology so you can play nicely with existing
infrastructure, for example having data pipe in to an Express server
for publishing out via socket.io.

Live reload

If you make any changes to a node file it's process will be
terminated, re-initialized, and if it was running, restarted. This is
really handy in development. Try running the ping-count-print example,
edit examples/nodes/print/index.js (just add a space somewhere) then
save it. You will see output in the log letting you know it's been
stopped and restarted.

Topology

A Topology is a collection of Nodes. In Straw, you create a
Topology, add nodes to it, indicating named pipes that connect the
nodes together.

Once you've made a Topology you can start or stop it processing, get
runtime stats from it and destroy it when done.

var straw =require('straw');
var topo =straw.create();

You can pass options in to your topology if you need to tell it where
Redis is, or to define a nodes_dir.

Logging can be silenced via the logging option. By default it's enabled.

redis.prefix will be prepended to all Redis keys used by that
Topology. This is useful for partitioning Topologies on the same
server. It is also passed in to the nodes so they can use it.

Topology Methods

#add(node) adds a node.

#start() will start your topology processing. Passive nodes (that
just receive inputs) will start checking their inbound pipes for
messages. The #start method on each node will be called to initiate
any active processing.

#stop() will call the #stop() method on all nodes, and stop
messages being consumed once the current message on each node is
finished.

#purge clears all queued messages from the topology's pipes.

#stats(callback(err,data){}) will provide real-time stats on the
nodes and pipes in the topology. For nodes, the in and out message
counts are given. For pipes, the number of messages currently queued.
See examples/stats.

You can add multiple nodes by providing an array of objects instead of
a single object.

To specify the location of a node relative to your topology code, use
__dirname + '/where/is/my/node.js'.

Normally you will want to store your node files in a folder called
'nodes' in the same location as the code that is using them

It's fine making a folder called nodes and referencing each node's
file directly. Be sure to use an absolute path, e.g. __dirname + './path/to/nodes/some-node.js' in your Topology definition, as nodes
are run in their own process and will have no concept of the directory
your topology exists in.

As a convenience, if you pass in options to your Topolology containing
nodes_dir: __dirname + '/path/to/nodes'` you can identify your Nodes
by their filename (without an extension) and Straw will take care of
finding the files for you. The demos in the examples folder do it this
way.

input and output can either be the key of a single pipe, or an
array of pipe keys. This lets you aggregate input and branch output.
If the output field is an array, the same message will be sent to each
of them.

You can provide multiple named outputs from a node. This lets you call
#output(<name>, message, callback) to send a message to a specific
output. Use this when you need to do routing based on the message
content.

Named outputs are specified as key-value pairs. The key is the name of
the output. The value can be a string (single pipe) or array (multiple
destinations for the same output).

Pipes

Pipes are implemented using Redis lists - lpush and brpop.

When more than one Node is connected to a given output, only one will
receive each message. This lets you easily load-balance output from a
node by connecting more than one downstream node to it's output.

When a node finishes processing a message it must call the done
callback. This signals it's ready for the next message.

If you want a message to go to several nodes, create multiple outputs
and connect one node to each.

examples/busy-worker.js and examples/busy-workers.js show this in
operation.

If no purge flag is set or if set to true, pipes are cleared when the
Topology is started so un-processed messages from previous runs are
not consumed. To retain them across restarts set purge to false.

Tap In/Out

You can connect to a Topology from existing code. These Tap methods
behave the same as those you would write inside your nodes.