Back pressure

No, not the medical term, and not this carambolage either (though it’s related):

What I’m talking about is the software kind: back pressure is what you need in some cases to avoid running into resource limits. There are many scenario’s where this can be an issue:

In the case of highway pile-ups, the solution is to add road signalling, so drivers can be warned to slow down ahead of time – before that decision is taken from them…

Sending out more data than a receiver can handle – this is why traditional serial links had “handshake” mechanisms, either software (XON/XOFF) or hardware (CTS/RTS).

On the I2C bus, hardware-based clock stretching is often supported as mechanism for a slave to slow down the master, i.e. an explicit form of back pressure.

Sending out more packets than (some node in) a network can handle – which is why backpressure routing was invented.

Writing out more data to the disk than the driver/controller can handle – in this case, the OS kernel will step in and suspend your process until things calm down again.

Bringing a web server to its knees when it gets more requests than it can handle – crummy sites often suffer from this, unless the front end is clever enough to reject requests at some point, instead of trying to queue more and more work which it can’t possibly ever deal with.

Filling up memory in some dynamic programming languages, where the garbage collector can’t keep up and fails to release unused memory fast enough (assuming there is any memory to release, that is).

That last one is the one that bit me recently, as I was trying to reprocess my 5 years of data from JeeMon and HouseMon, to feed it into the new LevelDB storage system. The problem arises, because so much in Node.js is asynchronous, i.e. you can send off a value to another part of the app, and the call will return immediately. In a heavy loop, it’s easy to send off so much data that the callee never gets a chance to process it all.

I knew that this sort of processing would be hard in HouseMon, even for a modern laptop with oodles of CPU power and gigabytes of RAM. And even though it should all run on a Raspberry Pi eventually, I didn’t mind if reprocessing one year of log files would take, say, an entire day. The idea being that you only need to do this once, and perhaps repeat it when there is a major change in the main code.

But it went much worse than I expected: after force-feeding about 4 months of logs (a few hundred thousand converted data readings), the Node.js process RAM consumption was about 1.5 GB, and Node.js was frantically running its garbage collector to try and deal with the situation. At that point, all processing stopped with a single CPU thread stuck at 100%, and things locked up so hard that Node.js didn’t even respond to a CTRL-C interrupt.

Now 1.5 GB is a known limit in the V8 engine used in Node.js, and to be honest it really is more than enough for the purposes and contexts for which I’m using it in HouseMon. The problem is not more memory, the problem is that it’s filling up. I haven’t solved this problem yet, but it’s clear that some sort of back pressure mechanism is needed here – well… either that, or there’s some nasty memory leak in my code (not unlikely, actually).

Note that there are elegant solutions to this problem. One of them is to stop having a producer push data and calls down a processing pipeline, and switch to a design where the consumer pulls data when it is ready for it. This was in fact one of the recent big changes in Node.js 0.10, with its streams2 redesign.

Even on an embedded system, back pressure may cause trouble in software. This is why there is an rf12_canSend() call in the RF12 driver: because of that, you cannot ever feed it more packets than the (relatively slow) wireless RFM12B module can handle.

Soooo… in theory, back pressure is always needed when you have some constraint further down the processing pipeline. In practice, this issue can be ignored most of the time due to the slack present in most systems: if we send out at most a few messages per minute, as is common with home monitoring and automation, then it is extremely unlikely that any part of the system will ever get into any sort of overload. Here, back pressure can be ignored.

Thanks – yes, I’ve used it a bit too. But (as so often), writing about this has also helped me better understand the core issues. The way this can be fully dealt with in Node is to use “streams”, in particular the streams2 improvements in v0.10 – it looks like streams will make a lot of the flows in HouseMon simpler and more robust, as opposed to always firing all events through EventEmitter. It’s all a learning process (“futures” are another technique I need to get much more comfortable with).

Thanks – a very nice read. In this case, I’m moving to streams, which can automatically handle back pressure: serial.pipe(parser).pipe(decoder).pipe(store) will get the work done with everything nicely separated, and with the pipes taking care of when to call the underlying transformation functions. The nice thing is that streams can be put in object mode, i.e. with arbitrary objects flowing through them, instead of text. Also great for packetising and de-packetising.

It’s always the same scenario with this stuff: after initial hype of how cool new technologies are, it “suddenly” turns out that async-paradigm products which promised higher performance and better resource utilization are memory/CPU hogs, NoSQL databases are hard to maintain, have random idiosyncrasies and hogs as well, etc. etc. That’s because all these “novelties” are old stuff originally developed for niche tasks, and requires special, really new-paradigm, approach. But people and programmers in particular are lazy, and of course don’t want to warp their minds to follow a strange paradigms, and do it “as usual”, with expected results.