The Go language has some weaknesses compared to other modern languages. It also however has some advantages that allow its users to do things that users of many other languages cannot easily do. In particular, to have a large number of green threads running on a single machine, making it easy to structure a server by having one such thread (goroutine) for each client connection, and to communicate between these goroutines with channels, making it easy to coordinate them and avoiding the need for mutexes and other forms of manual synchronization.

As an example, take the following program, a race relay race (it’s like a data race race). Imagine two pairs of n (in this case three) threads: (a1, a2, a3), and (b1, b2, b3). For each one, a message is sent from the first to the last and then back again (e.g. from a1 to a2 to a3 to a2 to a1). Whichever racer completes this first out of (a1..an) and (b1..bn) wins the race. Now, the average computer can have at most around 65k open tcp connections, so a server using one connection per client would require around this many goroutines. Let’s therefore set n to 65/2 = 32k; 32k green threads per racer.

Here’s how we make the racetrack. The function is recursive: in the case where numLeft is > 0, it makes another part of the track to the right, in a new goroutine, then creates another goroutine for handling message passing from its left to its right, then itself works on handling message passing from its right to left. This means each racer is actually using 2*numLeft goroutines, so we set the total number per track to 32/2=16k. If numLeft is less than one, we instead set to work passing the input from the left channel back into the left channel.

Now, you may wonder: why not just use one channel for both sending ints from the left channel right and those from the right left? It’s not good practice to have one goroutine writing and reading from the same channel, as if it attempts to write when the channel has data to be read then it will block (deadlock). We’d hence need four channels if we were only using one goroutine: left in, right in, left out, and right out. I found this alternative less subjectively pleasant than just having one goroutine for sending left and one for sending right

We create numRunners ints, each numbered with their start position (first to run is 1, last is numRunners-1), and send them down the channel. Then we receive them all, and stick them in an array which is then sent back through the appropriate channel. This allows us to examine the order in which they arrived.

That creates the racetracks, channels to them and for the race results to be returned on, and runs the races. Note the select statement: this is an extremely useful feature of Go that will receive from whichever of the target channels (in this case resultsA and resultsB) is ready first. Running this takes around 3 seconds, not bad for sending data across so many channels. What’s really impressive, however: quadruple that number to 64000, meaning a total of 256 thousand goroutines are created, and it still runs in around 3 seconds! Now what’s why Go is great for servers. I’m fairly certain that trying to do the same thing with 256 thousand kernel threads on this laptop would melt it.

Seemed to be waiting for foo(bar) to complete before running baz, preventing the program from working if foo is a function looping until it receives a message from baz. Not sure if this was a bug, or just me doing something wrong due to unfamiliarity with the language (probably the latter). Anyway, while it’s certainly possible to do in Erlang what Go can do, Go is five to ten times faster for cpu-intensive work. It’s also more statically typed, preventing errors that wouldn’t be caught at compile time in Erlang/Elixir.

*Edit: it should be noted that Erlang has something called Dialyser, which can do static checking of Erlang code. A redditor wrote an Erlang implementation of the code for comparison, available here, which performs quite well.

I considered an implementation in D, but while D has green threads in the form of fibres, but doesn’t seem to include a scheduler, so I’d have to write one myself. Also, while it has message passing between threads, it doesn’t seem to have message passing between fibres. It also has no equivalent of Go’s select statement, so the only way to poll multiple channels would be to wait on one with a very short timeout, then wait on the next, and so on in a loop, which is rather ugly.

*Edit: D does have a fibre scheduler as part of the the Vibe.d webserver, but this isn’t yet integrated into the standard library. Integration is planned such that eventually message passing between fibres will be possible in D as it is in Go.

What other languages have lightweight threads? Rust still has m:n threading, even if it’s not now the default, but I’m not sure if its message passing facilities work between green threads or just between OS threads. I attempted an implementation in Rust, but it fails with a generic ‘failed while receiving on a closed channel’ error not pointing to where in the code it occurred, and I ain’t got time for debugging that. Better to wait until the library and its error messages are more developed.

C# and F# don’t seem to natively support message passing between lightweight threads. Clojure has a library for it, but it seems to be an order of magnitude slower than Go. The JVM has Akka, but it’s incredibly verbose in Java, and to use it with Scala would first require learning Scala, which is for some reason a language that has never appealed to me.

Anything else? Interestingly, Haskell actually has lightweight threads and message passing, and they’re quite performant (see the benchmark linked in the previous paragraph). I attempted to write a Haskell implementation, and amazingly succeeded thanks to judicious use of unsafePerformIO, which is all the fun of unsafe sex with none of the risks (being able to write an `IO A -> A` function after having spent hours battering your head against the type system is pretty much the best thing ever). Yes, I’m a bad person. Anyway, it managed to match the Go implementation, handling even 360,000 threads and 70 runners with no problem. What’s more, if I compiled with O3 and had numRunners and numThreads hardcoded (as opposed to via argv), it would do all the calculation at compile time, inlining the result and always producing the same answer when run.

*Edit: turns out the unsafePerformIO was completely unnecessary. The same program without unsafePerformIO can be seen here, courtesy of a helpful redditor.

Okay, the Haskell implementation must have cheated. It works when setting numTreads to a billion, and there’s no way it’s actually creating a billion threads, no matter how lightweight they are.

Anyway, so Haskell can also operate with multitudes of lightweight threads; why still use Go? One reason: Haskell doesn’t have select, and it can’t even be implemented on standard channels as they offer no means of checking whether they contain data or not. It could be done with synchronous channels, which offer more functionality, but they failed with a build error when I attempted to cabal install them. There’s also Software Transactional Memory channels, but they operate in the STM monad rather than IO, so would lose the advantage of being able to stick print statements everywhere for debugging (well, you can if you really want to, you might just get things printed multiple times if the transaction is retried). Finally, there are MVars, which are like channels that only hold one item, and which allow checking for emptiness; these are what I used for checking who won the race.

Another reason to use Go: reasoning about both concurrency and lazy evaluation simultaneously is more difficult than reasoning about concurrency in a strict environment. Learning Go in general is also much easier: you can teach someone Go in a week, whereas Haskell takes a lot longer to learn. Finally: while it’s possible to write extremely fast code in Haskell, idiomatic code in Go is more performant than idiomatic Haskell.

In summary: Go is great for servers because it allows you to easily build efficient servers based on message passing between lightweight threads. Its advantage over Erlang/Elixir is that it’s much faster at calculation-heavy workloads and can do shared-memory concurrency. Its advantage over Haskell is that it’s much easier to learn, more idiomatically performant, and has a nifty select statement that as far as I’m aware Haskell hasn’t implemented. It’s also more ‘batteries-included’; no need to try and figure out which of the four+ different channel libraries is most appropriate.

Its advantage over Rust is that it’s stable, with a comprehensive standard library, while Rust’s standard library is still in development. It’s also easier to write, as there’s no need to fight the borrow checker or worry about lifetimes. While Rust’s advanced linear type system allows memory safety without a garbage collector, garbage collection isn’t usually a big problem for servers, so Rust’s type system represents a potentially unnecessary overhead. Although Rust does have easier-to-use garbage-collected allocation via managed pointers, its garbage collector is currently quite primitive compared to Go’s, so Go currently makes more sense to use if one desires a garbage-collected server language with lightweight message passing concurrency. Finally, the biggest advantage to me personally of Go over Rust: no macros, meaning no need to read other people’s macros. The only macros I’ve ever liked are lisp macros, and that’s largely because they can be expanded and examined dynamically in the REPL.

Go’s advantage over Java/Scala + Akka on the JVM? Concurrency in Go is much more concise than using Akka in Java. Go has the same advantages over Scala that it has over Haskell, i.e its extremely shallow learning curve, plus the extra advantage of significantly faster compilation. Finally, Akka uses an actor model, like Erlang, which is subtly different from the lightweight threads+channels model used by Go and Haskell and so is not a direct substitute.

The Go and Haskell implementations, along with the incomplete Rust and Elixir implementations, are available at this repo.

Any valid criticisms of this post (of which there are undoubtedly many) are welcome either here or at the relevant reddit discussion.

2 Responses to Why Go is Great (for servers)

You wrote: “There’s also Software Transactional Memory channels, but they operate in the STM monad rather than IO, so would lose the advantage of being able to stick print statements everywhere for debugging.”

STM blocks are typically short and are necessarily interleaved with normal IO–after all, STM is all about imperative effects–so no, you don’t lose the advantage of being able to stick print statements inside your non-transactional code, which is likely the majority of your code. (And, of course, unsafeIOtoSTM exists, if you really know what you’re doing.) Here’s an example: https://gist.github.com/anonymous/b80eb058bcc8fcbff4c0

It’s straightforward to write something similar to CML-style non-deterministic choice (“select”) using STM, as you noted in a reddit comment.

I meant it more as a joke than anything else. It’s also not hard to have a thread running that just continuously reads from a STM channel and prints the output, then pass that channel around and use it to print from inside STM in other threads.