There are lots of ways that different languages do concurrency, and I want to
talk about the general ways they do it, without getting bogged down in
language details.

So what is concurrency? It’s not parallelism, that’s for sure. It’s at it’s
simplest the ability to do work in the background while not pausing work in
the foreground. Some forms of concurrency can use parallel hardware resources
(CPU cores, etc), but not all.

I’m going to clasify low-level concurrency features (as opposed to high-level
patterns that can use multiple features at once) along the following axes:

Shared/Seperate memory

If two concurrent tasks share an memory, then sending data between them is
trivial but it’s possible to corrupt data that isn’t protected somehow. The
protection can either be locking of some sort, transactions, or just explicit
switching between tasks.

Allows parallelism / no parallelism

Concurrency is not parallelism, but if you have parallel hardware (multiple
cores, etc) it can often make sense to do parallel computation with the same
abstraction you use for concurrency. However the downsides are the need for
additional synchronization which can wash out any advantages you get from
parallelism.

Implicit / explicit task switching

If your tasks switch implicitly, you have to protect any data that can be
shared between different tasks. Explicit task switching removes that need, but
has boilerplate and can cause global slowdowns if a single task does not yield.

Forms of concurrentcy

Processes

Seperate memory

Allows parallelism

Implicit task switching

Processes are extremely safe to use. You can’t share data, and you can’t
freeze the system through negligence (though deadlock is always an option).
However, these process can be quite heavyweight in imperative programming
(they can be lighter weight in a functional system because zero copying is
necessary in order to send messages between processes)

Examples:

OS processes (heavyweight but general. Literally any language can use seperate OS processes)

Erlang processes (lightweight, but tied tightly to a particular system and language)

Threads

Shared memory

Can allow parallelism (depends on language/implementation)

implicit task switching

Moving from the safest interface to the least safe, threads can extremely
easily to corrupt your memory. For this reason some languages reduce the risk
with a global lock (python’s GIL or ruby’s GVL). I think threads work very
badly with dynamically typed languages because all writes are read/writes.
That makes correct locking extremly difficult. You still need to lock any
shared data.

However threads are extremely flexible. It’s what most other types of concurrency (including processes, inside the OS) are implemented with.

Examples:

OS threads (supported by most languages)

Goroutines

Async functions

Shared memory

No parallelism

Explicit task switching

This is what javascript uses. You schedule some task (usually some form of IO)
and wait for it to complete or fail. No async tasks are completed until you
either ask for them (in lower level languages), or all of your code has
returned (in higher level languages, especially javascript).

Examples:

poll/select/epoll/kqueue

javascript

event machine/twisted/tornado/etc

Why do most forms of concurrency fit one of these groupings? Let’s look at the others:

Seperate memory

No parallelism

explicit task switching

This just seems to not have any benefits: you can’t share data, you can’t do
anything in parallel, and you have to explicitly switch tasks all the time. If
you’ve got seperate memory there’s no reason to not allow implicit task
switching and parallelism.

Seperate memory

No parallelism

implicit task switching

This is a bit better. Erlang used to be like this (only one thread was
multiplexed between processes), but it’s really just a matter of technology to
allow parallelism. Again, if you have seperate memory you might as well allow
parallelism. That said, this is a perfectly reasonable initial implementation.

Shared memory

No parallelism

implicit task switching

Running go with GOMAXPROCS=1 is basically this. Same with greenlets. You still
need to protect your data from access by multiple threads, but in practice
less is required, you can get away with being sloppy. It’s kind of like the
old erlang processes: you don’t lose anything by being parallel so you might
as well do it down the line, though it’s more of a tradeoff here than a pure win.

Variants

These general categories of concurrency features have different tradeoffs, but
those can be changed somewhat by implementation choices. The fundamentals
don’t really change, but what’s cheap or expensive can change:

Processes

Lightweight processes

If you multiplex many processes onto a small, fixed numer of OS
threads/processes, you can make processes mor elightweight. The tradeoff with
lightweight versus full processes is that lightweight process generally cannot
call C code easily and directly, but they use less memory.

Threads

Lightweight threads

Lightweight threads are multiplexed onto a small number (usually equal to the
number of CPUs) of hardware threads. They have similar tradeoffs as
lightweight processes – they make interaction with the OS and hardware more
difficult, but use less memory so more can be started.

Static verification

This is rust’s big trick. Rust’s rules of ownership disallow data races at
compile time. In order to share data between threads you need a mutex or other
protection, and this is impossible to mess up in safe rust. This makes more
ambitious use of threads feasible. However it increases the complexity of the
language and can only catch a subset of concurrency problems (in rust’s case,
only data races).

Async

Promises/Futures

Promises (or Futures) are the representation of some value that will be
available eventually. They provide a good abstraction for building async
combinators on top of, which raw callbacks do not. Callbacks are more general,
but promises are a good basis for dealing with common concurrent patterns.

Async/await

First coming from C#, but now spreading to many languages,
this makes async programming look serial, but keeps all task switching
explicit. It can also be faked if you have a coroutine abstraction. The
tradeoff here is language complexity vs development efficiency.

In-depth examples

Erlang

Erlang is intended to be used in highly reliable systems. It does this by
having many processes that are isolated from each other and a tree of
processes monitoring each other, so that lower level process are restarted by
higher level processes. This leads to a lightweight process model: you don’t
want processes to have hidden dependencies on each other, because then you
can’t kill and restart them if something goes wrong, and you want to be able
to start a truly huge number of processes. Erlang is deeply affected by this
concurrency model – no types that cannot be efficiently serialized and sent
between processes that are possibly on different machines exist in erlang.
This makes erlang extremely well-suited for what it was designed for: highly
reliable networking infrastructure, but less well suited for many other types
of programming.

Go

Go was designed as a reaction to C++, and draws some inspiration from erlang,
specifically it has goroutines which are lightweight threads. Unlike erlang
however, goroutines are not prohibited from sharing memory (socially it’s
recommended to communicate by message passing, but sharing memory is allowed,
and easy to do by mistake). This takes away many of both the benefits and
drawbacks of erlang’s model. This has the side-effect of making Go more of a
Java competitor, rather than a C++ competitor: interacting with the system (as
in, calling C) has lots of overhead and complexity. That said, having threads
be cheap makes many nice patterns feasible that would be prohibitively slow in
other languages. Go also does provide good tools for communicating using
message passing, and strongly recommends it’s use. This has the effect of
having concurrency be much like the rest of the language: simple, pragmatic,
but full of boilerplate and pitfalls.

Rust

Rust is also a reaction to C++, but has much stronger compile-time
abstractions (as opposed to Go having almost all run-time abstractions). For
concurrency, rust experimented with many different forms: for a long time it
supported go style lightweight threads, however now it only supports
native threads built in (though like all languages you can spawn additional OS
processes, or use async functions). The advantage of rust over C++ in
concurrency is that rust enforces proper memory accesses at compile time. This
adds some complexity to the language (though rust gets great bang for the
buck: the same compile time check to ensure proper memory use with threads,
also ensures proper memory use within a thread), and can be hard to learn, but
matches the way that systems programmers generally already write code. This
makes rust a true systems language: low runtime overhead, interacting with the
system is basically free, and but more difficult to program in than higher-level
languages.

Node

Node’s answer to concurrency issues is to just always be single-threaded, and
use async functions for all concurrency. In fact, it doesnt have blocking
functions for many IO operations (and even ones it does have are rarely used).
This infamously leads to giant chains of callbacks, though these days promises
and async/await can help with this dramatically. It does split all javascript
functions into sync and async functions, something that has to be kept in mind
always while writing node code. The plus side is that it doesn’t make any
promises it can’t fufill, unlike other dynamic languages (like python and ruby
which offer threads but have locks on running all python/ruby code). Since
there’s almost no blocking IO, it also means that each node process can handle
quite a bit of IO, making it great for networking applications or web servers.
However node doesn’t have a great story for handling computation heavy code
yet. You can spawn a different OS process, but it’s still not an easy
operation. At some point node may introduce a lightweight process, but node is
probably never going to offer shared memory concurrency.

nginx

nginx is a great example of how to combine different concurrency models. It
spawns a thread for each CPU, and then within each thread uses async functions
to do ant actual IO. This makes for a highly efficient system: it can handle
lots of connections, but unlike somethign like node, if there’s some heavy
computation that needs to happen at some point other threads will pick up the
slack while one thread is blocked. Node can work around the issue sometimes with multiple processes, but multiple threads

Conclusions

This is more of an overview than anything, but I hope that it helped you
understand what different types of concurrency are available, and what the
different tradeoffs are. You could write a whole book about this topic.

My own opinion has shifted over time to think that lightweight threads and
processes are over-hyped. They aren’t bad ideas, but it’s not a pure win like
so many portry it as.

I was just at Fluent this week, and I had an interesting thought,
spurred by several things, but really crystalized when I saw
this talk by Eric Meyer.

So, the (perhaps badly named) concept of Isomorphic Javascript is usually sold as a performance optimization for
loading time in single-page applications, which is one benefit it provides. However it actually fixes the problem
with single-page apps – they break the web. A single-page app that does not render on the server (isn’t isomorphic) doesn’t just degrade
when javascript doesn’t work, it’s totally broken. Like, blank page. This is a problem on any page, but practically, it’s biggest on the open web (not behind a login). Closed sites (and especially enterprise sites/apps) can usually get away with doing various odd things, even though they probably shouldn’t. Things like web spiders, users on crappy mobile connections, users behind odd firewalls, these usually matter more on the open web. (Accessibility is also easier when rendering on the server, but can be made to work with javascript-only sites)

The thing is, older jquery-based progressivly enhanced sites had a number of problems:

You either had to have double the rendering code, or have your page look and work dramatically different without javascript

You might have a page that was technically usable, but in practice terrible without JS – datepickers are the most common thing I can think of. In a typical jquery-type date-picker progressive enhancement situation, there’s a text input with a particular format you need to use, which is much mor epainful to use than a datepicker.

As you move more logic into the client, maintenence and code orginization becomes a problem that traditional tools like jquery plugins just can’t solve.

The first-attempt solution to these issues was with the various first-gen javascript app frameworks. Angular 1, Backbone, Ember 1, etc. These frameworks were developed with the closed web in mind – enterprise apps, or at least ones that needed a login. I’m not sure the creators of those frameworks envisioned things like blogs using these frameworks, and indeed, it has caused problems when they do. They were tightly coupled to the actual DOM, so though they could be made to prerender the page with enough work, it wasn’t easy. Variousframeworks attempted to make rendering on the client and server equally easy, but it wasn’t really until React came out that the idea went mainstream. Now all of the next-gen frameworks (including Angular 2 and Ember 2) will be much easier to render isomorphically.

Which brings me to my point: Isomorphic javascript is just progressive enhancement done right. You always serve up a usable page, but you can do it without sacrificing all the benefits of single-page apps. Of all the ways to do progressive enhancement, it’s the most:

Accurate – the markup will be the same because it’s rendered by the same code

Maintainable – one codebase, one rendering path

Quickly rendering – we can use all the tricks of traditional html rendering sites to get the page to render fast

Isomorphic javascript can actually do things that are usually infeasible to do in typical progressive enhancement as well – it can render the page with your open modal or datepicker in it on the server, and have it work exactly like when javascript is working. None of this comes for free – testing and hard work is still needed – but things become feasible that weren’t before.

They’re also fragile – if you move either your spec file or your implementation file,
you’ve got to update your requires. This is a good argument for using lots of small modules
that can be broken out – if a module lives in your node_modules folder then requireing it
is always easy:

1

varfile1=require('file1')

The problem is that when you’re writing an app lots of the code can’t really
be seperated out to tiny modules – it’s app-specific. There have been a
few suggestions
on how to address this problem, but epr
is my attempt at solving it in a nice, repeatable way.

EPR works by making symlinks in your node_modules folder. It gets the list of symlinks to create
from your package.json file. So for the above example, you could add the following to your package.json
file:

If you’ve been following tech lately, you’ve probably heard people talking about the competition between x86 chips (mainly from Intel), and arm chips. Right they’re used mostly for different things – phones and tablets have arm chips, desktops, laptops, and servers have x86 chips – but Intel’s trying to get into phones and arm vendors want to get into servers. This promises lead to some exiciting competition, and we’re already reaping the power benefits of Intel working on this in desktops and laptops. However whenever this comes up, people bring up that arm is RISC and x86 is CISC, presenting RISC like it’s a pure advantage and that x86 must be crippled because it’s CISC. This doesn’t matter and hasn’t for a long time now, but let me explin why.

RISC means Reduced Instruction Set Computing, and it really comes out of the 80s, and it describes a certain style of instruction set (ISA) for a computer. The instruction set are all the low-level commands the CPU supports, so it might have things like “load this value from memory”, or “add these two numbers together”. The ISA doesn’t say how those commands have to be implemented though. Despite the name, the one thing that really seperates RISC from other types of instruction set is not the number of different instructions, but that most instructions to only one thing – they don’t have different addressing modes. On traditional architectures you’d have instructions that do the same thing, but can work on different types of operands. For example you might be able to add 2 registers together, or add memory and a register, or add memory to memory. This could become extremely complex, and arguably reached the height of it’s complexity in the VAX ISA. The VAX was very nice to write assembly code in, but the vast majority of those addressing modes weren’t needed when you use a language like C, and the compiler is responsible for making sure you load data when you need to.

The big argument that RISC proponents made was that you could cut out many of these addressing modes, and focus on making your basic operations fast, resulting in a faster overall chip. Since most modes in something like the VAX were rarely used they were usually microcoded and slow, so you had to know which modes were fast anywaysdefeating a lot of the point of having so many complex modes. RISC proponents dubbed traditional ISAs as CISC (Complex Instruction Set Computing), it’s not a term that anyone would use for their own work. RISC was very successful in the 80s – ARM started then, DEC (the makers of the VAX) made the alpha, Sun made the SPARC, and even IBM got into the action with POWER. However this was mostly in “big” chips (ARM being the big exception), the other story of the 80s was the growth of the micros – tiny chips that were cheap enough for individuals to buy started coming out in the 70s, and by the 80s there were lots of computer using them: think of IBM PCs (using x86), Commodore 64s (using the 6510, a varient of the 6502 which was used in the Apple II and NES as well), the original Apple Macintosh, and the Amiga (the mac and amiga both used the motorola 68k family). All of these were using what we’d consider CISC chips – they had various addressing modes. Nothing crazy like the VAX, but it was always the outlyer in ISA complexity. All of these ISAs still exist, but most are only used in tiny embedded chips (other than x86). Of those computer ecosystems, the PC took over the world, and the mac survived, but still is a small portion of the computer market (and uses x86 anyways these days after using a RISC chip for a while).

So with that story set, why doesn’t RISC and CISC matter anymore? Well there are 2 big reasons 2 things: out of order execution (ooo), and the fact that an ISA doesn’t specify how a chip is implemented. Out of order execution was the end result of a lot of things people were trying to do with RISC chips in the 80s – each instuction basically executes asynchronously and the CPU only waits for the results of an instruction if it’s being used by somethign else. This makes the ISA matter a lot less because it doesn’t really matter if you load data and use it in one instruction or 2. As a matter of fact, since the late 90s Intel has been internally splitting it’s CISC instructions into RISC-like micro-ops, which points out how the whole RISC vs CISC thing is pointless these days.

That doesn’t mean that ISA doesn’t matter, but the devil is really in the details now. x86 is honestly a bit of a mess these days, and decoding it is more complex than decoding ARM instructions (or really any other extant ISA). ARM also just updated it’s ISA for 64 bits, and from what I’ve heard it sounds like they did a really good job, basically making the totally generic RISC ISA with no weird stuff that makes it hard to use. X86 was never even close to the complexity of somethign like the VAX, so avoided a lot of it’s problems. RISC chips are also not without strange things that hurt them down the line – they often exposed internal details of their early implementations, which they had to emulate in later faster version. So if you want to compare the x86 and arm ISAs, that’s actually an important and interesting comparison to make, but the acronyms RISC and CISC don’t actually add anything.

I’ve been playing around with the WebAudio api for a bit and come up with a nice little demo program
that shows the basic capabilities of the OscillatorNode interface (plus some fun canvas programming). It’s not a serious
project, but it is fun. I’m calling it osc, and you can also check out the source code.