User login

Navigation

The new old or The "Return" to Concurrency

In order to develop a fairly complex pipeline of operations for a content management system I am developing I found myself resorting to the old unix way of doing things: I need to process a large set of data (emails), so I set up a pipeline of coprocesses (with messages between each process relating to some chunk of email on disk)

cp1 | cp2 | cp3 | cp4 | cp5 .. cp12

While this may seem trivial to most people here, I was struck by how profound this classic (20-30 yr old) approach is. Yes, I know that unix (shell) pipes are limited because they are only unidirectional, but if I followed status quo these days the implementation would have been a monolithic OO app (with cp 1-12 being objects passing messages to each other) or perhaps something more FP (with cp 1-12 being a chain of pure functions calls).

Instead, here we have a truly concurrent solution that will take advantage of multiple CPUs, message passing, and has strict encapsulation -- all in a language neutral architecture.

This came about as an experiment relating to using a severely restricted language (in this case AWK) to implement a fairly complex application. Working under Unix with minimal tools is yielding ways of thinking I haven't considered since my hardcore Unix days in the 80s.

While this may sound like just a simple workflow problem, for my app there is some conditional variability in play where some processing may need to be excluded from the workflow, but that too can be handled by traditional unix piping: if a process has nothing to do to certain data (or is instructed by the previous process not to touch certain data) it is simply passed along (untouched) to the next process.

Nothing mind boggling here, but it did strike me as interesting from a monolithic super language vs small language in a unix environment perspective.

Comment viewing options

I've always felt that as programmers moved to Unix-like systems from more inflexible environments, they brought their monolithic programming habits with them, neglecting the native tradition of program construction out of small, sharp tools.

Can anything be done to revive this potent tradition before it exists only in the aging memories of old farts and in the out-of-print and obsolete books of fallen Bell Labs demigods?

BTW, in addition to Unix pipelines, there is also Paul Morrison's Flow-based Programming, which uses the same general architecture in the small. FBP is interesting because it was explicitly designed with the needs of business application programming in mind.

I second the recommendation for Flow-based Programming. It's a very interesting and practical technique, and Morrison's book gives a lot of real-world examples.

As regular readers of LtU know, Unix pipes are a special case of declarative concurrency. Chapter 4 of CTM explains this programming style and gives lots of examples. I sure hope this style will not be forgotten, because it has many advantages over the usual monolithic, shared-state style. The same goes for the message passing style, which has many of the advantages of declarative concurrency. For example, Erlang uses it to build highly fault-tolerant systems and E uses it to build secure systems. Fault tolerance and security are a lot simpler to achieve with dataflow and message passing than with the usual shared state style (i.e., monitors).

One of the nice things about Erlang is how it offers a rich environment for doing concurrency: It is, in essence on par with Unix (with its rich support for process management). In Unix, coprocessing is natural and tied to processes (rather than "threads"). There is very rich (environmental) support for managing processes.
Concurrency (process level!) is built into the core of Unix and message passing is the default paradigm. Erlang appears to mirror this but (potentially) at a much finer granularity.

Often, even when other concurrency enabled languages (C++ with Posix Threads, Concurrent ML, Haskell, etc) are used for even coarse grain concurrency, the support environment is lacking (Can I independently manage the "threads"? Can I monitor them? Can I inspect their internal state? Remotely stop them? Signal them? Restart them? Remotely change their run priority?). This is what I am bemoaning. As we move to doing more message passing concurrency in our new monolithic languages, will the rich support be there (like processes under Unix)?

Peter, I am still slugging away at CTM (great book!), does Mozart offer rich "tool" support for its concurrency?

There is both low-level support and some tool support. The threads can be managed to some degree (changing priorities, suspending and resuming, injecting exceptions, etc.). With the distributed programming abilities of Mozart, this can be done remotely. It's also easy to write abstractions that control concurrent programs (like the remotely updatable server in chapter 11 of CTM). There are some tools that work well in a concurrent setting: the Browser (visualization of data structures), the Ozcar debugger, the Profiler, the Distribution Panel (visualization of distributed execution), the Explorer (interactive constraint solving), and a few others. But there could be more tools to help programmers write concurrent programs! We would love to see people contribute some of these tools. There are also some language design issues that need to be tackled. For example, the ability to 'freeze', 'clone', and 'unfreeze' thread state would allow implementing flexible component-based programming, with first-class components that can be stopped, modified, and started. It would also allow adding continuations to Oz.

Very nice. For my content management system (a hobby/diversion), I was only able to get my arms around the design and (evening time only) implementation by breaking the problem down into discrete components. I considered Concurrent ML (too hard to debug and too verbose) and Erlang (way more concurrency support than I needed) but ended up going with the "unix way".

It had an interesting impact on my thinking. For example, before I adopted the "unix way" I had previously implemented a single component/process that parsed XHTML into a specially tagged list and then operated upon that list to do content-insertion magic. This was implemented in a language that had strong/easy XML parsing support. After my experience doing the "unix way", I would have broken the XHTML component into 2 pipe connected processes: cp1 (parse XHTML into a stream) -> cp2 (content-insertion magic).

Now, I wonder how this approach would work using a concurrency-oriented approach in a rich concurrency environment likee Erlang or Mozart. It's amazing how studied ideas are often still a long way coming before one gets hit with the aha!.

An Erlang-ish approach would have me doing something very similiar to the "unix way": Lots of little encapsulated processes (that do just one thing) feeding their results into other processes. I would be able to co-develop each component independently (swapping in better implementations, debugging components by instrumenting proxies for others)

If Mozart gives me enough environment richness to support debugging this style... :-)

My apologies to the collective lambda community if I am rambling the obvious. I am just fascinated how everything seems to be "connected". I would have never thought that the way I used to program 20 years ago (i.e. "the unix way") would come full circle for me.

I came across this paper (PDF file) during a long (and ongoing) quest for CMS information/enlightenment. Perhaps you'll find some of its ideas worthwhile, such as its proposal to use the make utility to track page dependencies. Good luck on your project.

I've been working carefully through this paper and plan to post some comments soon. My quick, high-level take is that, contrary to the authors' implications, comonads are not particularly well suited to capturing dataflow semantics.

True, it can be done, and theirs is an impressive accomplishment, but I think I've come up with an approach that is perhaps simpler or at least dual to theirs. My formulation of the semantics conforms to the intuition that dataflow languages are just functional languages with stream data types. (I gather that this has historically been a contentious assertion in the Lucid community, but it seems right to me so far.)

In particular, I build an interpreter for a generic functional language assuming that the ground types support the following operations:

and make scalar and stream instances of this class to get scalar (normal functional) and stream (dataflow) semantics.

Also, and here I'm not as sure, I think there are some problems with their "distributive" semantics for synchronous dataflow. In particular, I believe their "if" is not point-wise, and should be, and their arithmetic operations do not cause an error when applied to streams with different clocks. One could excuse this latter problem by saying that their interpreter assumes type-checking has already rejected any such program, but I think a more mainstream approach to interpreters is that they turn static errors into dynamic ones, rather than turning static errors into silent failures.

SuperCollider looks like a neat language, independent of its music and audio parts (which of course make it automatically neat). My interest in dataflow is vaguely related to DSP but not to music. In general my work at the moment relates to semantics of block diagram languages, which of course often have dataflow underpinnings.

I read an interesting draft paper by McBride and Paterson recently discussing a similar abstraction ("Applicative programming with effects" ).
It seems like 'Ground' (or 'Applicative Functor') is a very practical abstraction that just has been overlooked for so long because of its close relation to monads.

Indeed it is very relevant. I've updated my paper and code to use their terminology. The link to my paper in my original post above now points to that new file. It is good to find that my work is related to others.

I think one of the reasons why Erlang is such a success for concurrent/scalable applications is that it models itself on the Operating System paradigm: everything is a process, an IPC primitive is in the language, processes can't interfere with eachother, failing processes can be insulated from the rest of the system, communication with the outside world is via 'ports' managed by wrappers (drivers).

(Modern) Operating Systems are designed to be multitasking and scalable, why not use these ideas that have been refined over many years in modern languages/runtimes?

Indeed, in Making Reliable Distributed Systems in the Presence of Software Errors Joe Armstrong makes a comment along the lines of 'an operating system merely presents Erlang with a set of device drivers' (poorly paraphrased from a shocking memory).

Another huge benefit from Unix piping: I can "drop in" replace any component by restarting the pipeline. (However, Erlang and Mozart can apparently replace a component without restarting.) On the other hand, Unix can "drop in" a replacement coded in any programming language.

This serves me well for prototyping. My pipeline consists of awk, Tcl, C (and other stuff). I can prototpe a component in awk and replace it with C (or for that matter just keep that component in awk). Ah, mini-languages with domain specialties. I've gotten too used to big languages -- why would I want to replace a 10 line awk component with C, Perl, Tcl or XYZ?

But there are some components in my pipeline that require more complex processing... Now, if only I could get the Erlang or Oz executables (engines) to run as lightweight as awk ;-)

why would I want to replace a 10 line awk component with C, Perl, Tcl or XYZ?

Because your C program can access custom hardware that awk cannot. A 10 line awk script that (given your presumably large input) takes 10 hours to run can finish in a few seconds.

Or maybe because your 10 line awk script isn't doing the job right, and the limitation turns out to be awk itself, and not a bug in your script.

Use the right tool for the job. If a 10 line awk script will do the job well enough, fine. If not, find a new tool that will do the job better.

Perl was created originally because the awk/sed combonation was not able to do some job that Larry Wall needed to do. So he set out to write a version of awk/sed that was able to deal with his additional needs.

Yes, that is the whole point: Use the right tool for the job. Check out the context of my statement. I think we are in agreement :-)

As an interesting aside, I recently replaced a horribly slow parser written in a proprietary language (part of an audit reduction product) with a combo of bourne shell and awk. The original took 2hrs to process 1GB of data, the bash/awk combo took 30 seconds. Now, I may be able to drop that down to fewer seconds if I code in C, but a quick "grep" through the data for one of the parse targets took around 20 seconds. I don't know if I can best the grep result and 30 seconds is probably fast enough ;-)

You might want to take a look at upcoming Microsoft's Monad shell. It seems to be trying to generalize the Unix approach from streams of characters to streams of objects, albeit untyped. Ars Technica has a nice overview here.

A language that might be interesting in the Flow-Based Programming context is OmniMark. This page, somewhat out of date, explains the basic concepts of the data-flow (or streaming, as its called in OmniMark), and this one presents recent additions to the streaming capabilities of the language.

(Disclosure: I'm a designer and developer of the OmniMark language. I'm paid to do it. I'm not paid to evangelize it.)

Concerning Unix pipes, I have a question that has bothered me for a while. Why are named pipes so notorious? ESR's online book The Art of Unix Programming even calls them "a bit of a historical relic". Compared to to the socket interface, they are much easier to use, they're open to shell programming, and also more scalable. Can anybody tell me what the problem is?

I agree. They do have their quirks, but generally named pipes are great. Maybe it's just the crufty syntax of the old mknod? ;)

Actually, I always felt like the shell should give me something like named pipes by itself, and that involving the filesystem was a little dirty. The generalized process redirection features in scsh are pretty nice.

In some sense, Unix separates the concept of "filesystem" from the concept of "directory"; the persistent store can contain any number of filesystems. There is no requirement that a directory element actually refer to a file, though; it could refer to any named resource provided the directory metadata is rich enough to describe the resource. Named pipes are one such resource.

A shell could, of course, bind a local name to a resource, and that is a useful feature. But it does not solve the same problem as persistent named resources. A named pipe is very much like a TCP socket (which also has a persistent name in the form of an IP/port); server and client processes can independently attach to it, without having to be otherwise related (except that in the case of named pipes, they must be running on the same OS instance, at least on common Unix implementations.)

Keeping the persistent name in the directory structure strikes me as clean, not dirty; it allows code- and concept- reuse. For example, named pipes are subject to the same (limited) security model as files.

Other OS's (Plan 9, for example) extend this concept in interesting and useful ways.

You're right, of course. I probably should have been more clear... I've used named pipes in two ways. One is the persistent use you describe, and I agree that the filesystem makes perfect sense for that. The other use is basically to have a more flexible way of building pipelines than "a | b | c", and using the filesystem for these transient pipes often leads to nasty temporary files like /tmp/pipe.34543. To me, a file like that would rather be a local variable in the script or shell.

Of course, I could imagine an extended filesystem that makes this cleaner (perhaps by adding a fs tree which is local to this process), but then from a language perspective that sounds an awful lot like local variables. And of course you can do exactly this in any language that gives you direct access to the system calls.

As I recall, the big problem with named pipes is that you don't have anything like accept(); if three processes open the pipe for writing simultaneously, the kernel doesn't do anything to disentangle the data they write to the pipe. You can manage a client/server mechanism, because the kernel guarantees that write()s of no more than PIPE_BUF bytes won't get intermingled. The client creates named pipes based on its process ID, opens the master named pipe, writes its process ID to it, and waits for the server to open the named pipes the client created.

But it's a pain—among other things, you have to put in machinery for deleting the pipes when you crash. Named pipes were probably a nice hack 20 years ago, because you could create a pipeline between unrelated processes. Today, you can do the same thing with nc, which lets you plumb a pipeline over TCP.

Python can't take advantage of multiple CPUs though. The components don't run concurrently; they are interleaved. When one component needs a value, the immediate upstream component runs for a moment until a value is produced.

Python can't take advantage of multiple CPUs though. The components don't run concurrently; they are interleaved. When one component needs a value, the immediate upstream component runs for a moment until a value is produced.

That's only a part of the problem with using generators as pipes. More serious is the fact that generators are not true coroutines: the consumer cannot resume a generator at an arbitrary point. The producer always drives. To see the full impact of the problem, try sketching an implementation of a diff or tee Unix utilities in Python. I don't know if Stackless Python has fixed this issue. Ruby is no better in this regard. Lula, OmniMark, and Modula-2 are the only languages with proper coroutines I know of. Continuations (delimited?) have been shown to be strong enough to implement them, so I suppose you can add Scheme to the list. Any language with support for threads can emulate coroutines as blocking threads, but this solution is prohibitively expensive.

The sad thing is that coroutines are almost a trivial thing to implement at the VM level: what they boil down to is the ability to have multiple stacks. I don't know why so few modern virtual machines bother to support them.

The first thing I look for when a new general-purpose language is released these days: support for lightweight concurrency. At least that gets us back to what we used to do all of the time under unix (via lightweight co-processes tied together with pipes, named pipes, shared memory and unix sockets).

My motivation for co-processing (with awk+unix) is that it simplified the partitioning of my application. After reading many of the responses in this forum topic, If I had the time I would have considered Oz, Erlang or Scheme48 (now with lightweight threads!). Of what I've seen so far, only Oz and Erlang supports management of it's co-processes at a unix equivalent level. They come with tools. My experience with Python generators (limited), Posix threading (extensive), Concurrent ML (extensive) and Lua co-routines (limited) is that you have little introspective help when things go wrong.

More serious is the fact that generators are not true coroutines: the consumer cannot resume a generator at an arbitrary point. The producer always drives.

I think you might be misunderstanding something about Python generators; or I might be misunderstanding your post.

When you say "the producer always drives", it sounds like you're talking about Ruby blocks; or worse, the "visitor pattern" often used in Java (ack). Python generators are different. A generator is more coroutine-like than you seem to think.

For example, zip(gen1(), gen2()) works in Python (but not Ruby).

To see the full impact of the problem, try sketching an implementation of a diff or tee Unix utilities in Python.

The main limitation of Python generators (at least in Python 2.4) is that they are flat: a Python generator cannot invoke a subroutine which does the yielding for it; all yielding must be done directly in the generator. The unavailability of general resumption is not a theoretical barrier (as the Lua work shows), though it may be a practical one (it's hard to mix two separate Lua controller coroutines).

I've written (in collaboration with a couple of others a component system entirely based on communicating python generators. Someone else was kind enough to post a link regarding this below (Kamaelia), but there is a key point: in order to get reuse from your coroutines (which is really where you boost the concurrency and dataflow), each of the items in the pipeline (or in ourcase graphlines too) needs to be simple.

In practice single level generators (which you describe as flat) encourages simpler generators, and decomposition of a system into more generators. This naturally leads to a higher level of dataflow, and if you provide a mechanism for distributing generators over processes (we don't at present BTW), then you have a scaleable method for using multiple CPUs.

We decouple generators by embedding the generator in a class, which provides all the generators a default multable object. That mutable object has inboxes and outboxes associated with it. If you take a piece of data from an inbox, you own it and can do what you like with it. If you place a piece data in an outbox you no longer own it. (We're thinking of an analogy based on paper passing from physical in/out-boxes)

Putting these things together seems to result in a very high level of reuse. It also means that creating new components is quite simple - you write a small stub program that does what you want - and communicate with stdin and stdout. When you're happy with the results you replace this with sending data to outboxes and receiving data from inboxes.

A stepped walkthrough making new components can be found here . A graphical runtime introspection tool is described here . A nascent command line shell is described here.

(1) Handles 2 sets of slides, and a set of 'graph' slides all of which can overlap with transparency. (allows "sub-presentations" for example, which can be skipped if desired etc).

I'm not really aiming to plug Kamaelia here, but more draw attention this the idea that just because python's generators are single level, this doesn't mean you can't do interesting things, in a very different way from you usually do. (And a way that we've found novices to programming find fairly natural)

For those who prefer C++, we've also got a nascent (utterly naively coded) C++ version in CVS here . (Wraps up some primitives based using Duff's device). Again the aim is to point out that these techniques are more general than people think and single level yielding doesn't have to be the problem it might first appear to be - and that in fact it turns out to strengthen the system by the looks of things.

I have an implementation of Flow-Based Programming in C which does manage multiple stacks, but the problem is that I/O hangs the whole app, unless you use asynchronous I/O (e.g. POSIX.1b standard?). On the other hand, I also have a Java implementation which implements the connections between threads using the 5.0 ArrayBlockingQueue class - only the connections are blocking, and it can take advantage of multiple processors, so I feel it should perform pretty well on a multiprocessor machine. See http://www.jpaulmorrison.com/cgi-bin/wiki.pl?ExecutionEnvironments.
Feedback would be appreciated!

A key aim of Kamaelia is to enable even novice programmers to create scalable and safe concurrent systems, quickly and easily
Lego/K'nex for programmers. For people. For building things. It's about making concurrency on systems easier to use, so easy you forget that you're using it. It's been done once before, spectacularly well, so well many people forget it's there, a key example - unix pipelines. However it's been done in hardware since day 1, since that's how hardware works.

One day, I sat back and realised that network systems looked almost identical in nature to the asynchronous hardware systems, conceptually, with one major exception. In hardware, you don't know who your buffers are connected to via wires. You have a protocol for getting that information over (be it a clock, or handshake circuits) but no other knowledge.

Kamaelia was borne, technology wise, from the idea "what if we developed software like hardware" - each component with no direct knowledge of any other. Similar to programs in a unix pipeline. This is proving to be a very useful approach.

Kamaelia is the result. Kamaelia is divided into 2 sections:

* Axon - this is a framework, developed by application spikes, for wrapping active objects. Specifically these are generators (mainly) and threads. The resulting library at it's core is pretty simple - a novice programmer can learn python one week and implement their own version in about a week.
* Kamaelia - this is the toy box - a library of components you can take and bolt together, and customise. This includes components for TCP/multicast clients and servers, backplanes, chassis, Dirac video encoding & decoding, Vorbis decoding, pygame & Tk based user interfaces and Tk, visualisation tools, presentation tools, games tools...

The reason for concurrency here isn't because we're after performance, but due to the problems we're facing are naturally concurrent - millions of people watching content. Therefore, the aim is to make dealing with this concurrency simple/easy, or natural/fun. Hence the lego/K'nex analogy.

What's the underlying metaphor we use?

In hardware you have pins which the hardware "talks" to. In unix shells, you have stdin and stdout. For Kamaelia we decided to use something a little more concrete.

Take a person sitting at a desk in a world pre-desktop-computing. She could have a bunch of inboxes & outboxes on her desk. Suppose that the inboxes are labelled "timesheets", "newhires", "fires", and that the outtrays are "accounts", "security", "HR".

She can work on messages she gets on inboxes, and generate messages on outboxes. A postman then performs deliveries between the people - the active objects. The postman knows where things are going, and therefore if you need to add ing (say) auditing you can do that without modifying the way the person/active object works.

This is precisely how Kamaelia works. It models itself on a real world system to encourage behaviours that simplify concurrency.

Example

Suppose I want to create a simple presentation tool - where I type some text, it goes to a server. People connect to that server and can "listen" to what I'm typing in a nice display, the three main sections of that system could look like this:

You write new components in the same way as writing a small script. Start off reading/writing from stdin/stdout, until you're happy with it. You then replace inputs/outputs with inboxes/outboxes. That component can then be used with any other as long as they accept that form of python object. For example, consider exchanging ConsoleReader() & Ticker() with AlsaReader() and AlsaPlayer() to create a simple radio style system.

That's the idea in a nutshell. (did I mention all the components above run in parallel?)

Things people have done with it

At R&D we've used it for sending subtitles to mobiles, building a networked audio mixer matrix, previewing PVR content on mobiles, joining multicast islands together using application layer tunneling and also a game for small children :-)

I also use Kamaelia for all my presentations these days.

Kamaelia has been used by BBC Radio & Music to produce a record of transmission (for 8 BBC channels 24x7). This is a development box for internally monitoring what is actually broadcast vs what the EPG data says. This enables prototyping of new services (subject to all sorts of restrictions). Examples include podcasts of all of BBC radio, particular tastes or genres. That then allows people to decide if they want these things and decide how to move forward with the industry.

Kamaelia's role was to be used to build a proof of concept prototype. It did prove the concept, so they worked on a traditional style, production quality replacement. We're now working with them to work towards a second generation architecture.

Summary

And that is Kamaelia. A framework for components, a library of components, and a way of making systems quickly and easily.

So, the reason I'm talking about it at Euro OSCON, is because we're Kamaelia itself to be useful & fun, and also it's seems to make concurrency easy to work with. Hopefully this should come over in my talk!

Forking processes under modern implementations of Unix is actually a fairly lightweight thing to do, though not as lightweight as userland thread libraries, of course. Still, the days of systems when it was expensive to fork processes are pretty well done, at least for applications that don't need to handle gazillions of transactions per second.

Pipes and unix domain sockets have been optimized over the years too. The cost of sending data between processes is quite low unless you have complex (and large) data structures that need to be serialized and reconstituted.

Lightweight processes (LWP or kernel threads that share address spaces) was force-fitted into Unix. It's a foreign concept that was never truly made first class.

...but if I followed status quo these days the implementation would have been a monolithic OO app (with cp 1-12 being objects passing messages to each other)...

What status quo is this? See the Command pattern from GoF. Handlers are fully componentised, nothing monolithic about it. Also, spawning processes might be too heavyweight—particularly for long chains. As ever, use the right paradigm for the job. If scalability is an issue, factor it into the design.

However you factor the app, if you run under unix and it consists of one big executable (even if you do threading), you are a monolith (by my homegrown, no-authoritative definition! ;-)

A lot of programs (running under Unix) tend to forget (or ignore) the rich patterns within the unix environment itself. Fork is not that expensive (anymore); socket/pipes are not inherently slow IPC mechanisms; init will spawn and manage long running processes; inetd will do TCP/IP connection management for simple one->one server apps, syslog is the standard log facility; etc.

The "Unix Way" is certainly not the end-all. It has its many defects and limitations. But I've seen a lot of C++ code that goes out of its way to be a "world unto itself" (even when it was intended to be deployed under Unix).

You can reword Greenspun's tenth law of programming onto unix (with apologies): Any sufficiently complicated monolithic program (under Unix) contains an ad-hoc, informally-specified bug-ridden slow implementation of half of the Unix Environment.