This is an introduction to concurrent programming with examples
in Go. The text covers

concurrent threads of execution (goroutines),

basic synchronization techniques (channels and locks),

basic concurrency patterns in Go,

deadlock and data races,

parallel computation.

Before you start, you need to know how to write basic Go programs.
If you are already familiar with a language such as C/C++, Java, or Python,
A Tour of Go will give you all
the background you need.
You may also want to take a look at either
Go for C++ programmers or
Go for Java programmers.

1. Threads of execution

Go permits starting a new thread of execution,
a goroutine,
using the go statement.
It runs a function in a different, newly created, goroutine.
All goroutines in a single program share the same address space.

Goroutines are lightweight,
costing little more than the allocation of stack space.
The stacks start small and grow by allocating and freeing
heap storage as required.
Internally goroutines are multiplexed onto multiple operating system threads.
If one goroutine blocks an OS thread, for example waiting for input,
other goroutines in this thread will migrate so that they may
continue running.
You do not have to worry about these details.

The following program will print "Hello from main goroutine".
It might print "Hello from another goroutine",
depending on which of the two goroutines finish first.

func main() {
go fmt.Println("Hello from another goroutine")
fmt.Println("Hello from main goroutine")
// At this point the program execution stops and all
// active goroutines are killed.
}

The next program will, most likely,
print both "Hello from main goroutine"
and "Hello from another goroutine".
They might be printed in any order.
Yet another possibility is that the
second goroutine is extremely slow and doesn’t print
its message before the program ends.

func main() {
go fmt.Println("Hello from another goroutine")
fmt.Println("Hello from main goroutine")
time.Sleep(time.Second) // wait 1 sec for other goroutine to finish
}

The program will, most likely, print the following three lines,
in the given order and with a five second break in between each line.

$ go run publish1.go
Let’s hope the news will published before I leave.
BREAKING NEWS: A goroutine starts a new thread of execution.
Ten seconds later: I’m leaving now.

In general it’s not possible to arrange for threads to wait for each
other by sleeping. In the next section we’ll introduce one of Go’s
mechanisms for synchronization, channels, and then
we’ll demonstrate how to use a channel to make one goroutine wait for another.

2. Channels

Sushi conveyor belt

A channel
is a Go language construct that provides a mechanism
for two goroutines to synchronize execution and communicate by
passing a value of a specified element type.
The <- operator specifies the channel direction,
send or receive. If no direction is given, the channel is bi-directional.

chan Sushi // can be used to send and receive values of type Sushi
chan<- float64 // can only be used to send float64s
<-chan int // can only be used to receive ints

To send a value on a channel,
use <- as a binary operator.
To receive a value on a channel, use it as a unary operator.

ic <- 3 // Send 3 on the channel.
work := <-wc // Receive a pointer to Work from the channel.

If the channel is unbuffered,
the sender blocks until the receiver has received the value.
If the channel has a buffer,
the sender blocks only until the value has been copied to the buffer;
if the buffer is full,
this means waiting until some receiver has retrieved a value.
Receivers block until there is data to receive.

Close

The close
function records that no more values
will be sent on a channel. After calling close,
and after any previously sent values have been received,
receive operations will return a zero value without blocking.
A multi-valued receive operation additionally returns a boolean
indicating whether the value was delivered by a send operation.

The main program starts like before: it prints the first line and then
waits for five seconds. At this point the goroutine started by the
Publish function will print the breaking news and then exit
leaving the main goroutine waiting.

The program will not be able to make any progress beyond this point.
This condition is known as a deadlock.

A deadlock is a situation in which threads are
waiting for each other and none of them is able to proceed.

Go has good support for deadlock detection at runtime.
In a situation where no goroutine is able to make progress,
a Go program will often provide a detailed error message.
Here is the output from our broken program:

The two goroutines, g1 and g2,
participate in a race and there is no way to know in which order the operations
will take place. The following is one out of many possible outcomes.

g1 reads the value 0 from n.

g2 reads the value 0 from n.

g1 increments its value from 0 to 1.

g1 writes 1 to n.

g2 increments its value from 0 to 1.

g2 writes 1 to n.

The programs prints the value of n, which is now 1.

The name ”data race” is somewhat misleading.
Not only is the ordering of operations undefined;
there are no guarantees whatsoever. Both compilers
and hardware frequently turn code upside-down and inside-out
to achieve better performance. If you look at a thread in mid-action,
you might see pretty much anything:

The only way to avoid data races is to synchronize access to
all mutable data that is shared between threads. There are several ways to
achieve this. In Go, you would normally use a channel or a lock.
(Lower-lever mechanisms are available in the
sync and
sync/atomic packages,
but are not discussed in this text.)

The preferred way to handle concurrent data access in Go is to
use a channel to pass the actual data from one goroutine to the next.
The motto is: ”Don’t communicate by sharing memory; share memory by communicating.”

In this code the channel does double duty. It passes the data
from one goroutine to another and it acts as a point of synchronization:
the sending goroutine will wait for the other goroutine to receive the data
and the receiving goroutine will wait for the other goroutine to send the data.

The Go memory model –
the conditions under which reads of a variable in one goroutine
can be guaranteed to observe values produced by writes to the same variable
in a different goroutine –
is quite complicated,
but as long as you share all mutable data between goroutines
through channels you are safe from data races.

6. Mutual exclusion lock

Sometimes it’s more convenient to synchronize data access
by explicit locking instead of using channels.
The Go standard library offers a mutual exclusion lock,
sync.Mutex,
for this purpose.

For this type of locking to work, it’s crucial that all accesses
to the shared data, both reads and writes, are performed only
when a goroutine holds the lock. One mistake by a single goroutine
is enough to break the program and introduce a data race.

Because of this you should consider designing a custom data structure
with a clean API and make sure that all the synchronization
is done internally. In this example we build a safe and easy-to-use
concurrent data structure, AtomicInt, that stores a single integer.
Any number of goroutines can safely access this number through the
Add and Value methods.

7. Detecting data races

Races can sometimes be hard to detect.
This function has a data race and when I executed
the program it printed 55555.
Try it out, you may well get a different result.
(The sync.WaitGroup
is part of Go’s standard library;
it waits for a collection of goroutines to finish.)

A plausible explanation for the 55555 output
is that the goroutine that executes i++ managed to
do this five times before any of the other goroutines executed
their print statements.
The fact that the updated value of i was visible
to the other goroutines is purely coincidental.

A simple solution is to use a local variable and pass the number
as a parameter when starting the goroutine.

The tool found a data race consisting of a write to
a variable on line 20 in one goroutine,
followed by an unsynchronized read from the same variable
on line 22 in another goroutine.

Note that the race detector only finds data races that actually happen
during execution.

8. Select statement

The select statement
is the final tool in Go’s concurrency toolkit.
It chooses which of a set of possible communications will proceed.
If any of the communications can proceed, one of them is randomly
chosen and the corresponding statements are executed.
Otherwise, if there is no default case,
the statement blocks until one of the communications can complete.

Here is a toy example showing how the select statement can
be used to implement a random number generator.

Somewhat more realistically, here is how a select statement
could be used to set a time limit on an operation.
The code will either print the news or the time-out message,
depending on which of the two receive statements that can proceed first.

The function time.After
is part of Go’s standard library;
it waits for a specified time to elapse and then sends the current time
on the returned channel.

9. The mother of all concurrency examples

Take the time to study this example carefully. When you understand
it fully, you will have a thorough grasp of how concurrency works in Go.

The programs demonstrates how a channel can be used for both sending and
receiving by any number of goroutines. It also shows how the select
statement can be used to choose one out of several communications.

$ go run matching.go
Cody sent a message to Bob.
Anna sent a message to Eva.
No one received Dave’s message.

10. Parallel computation

One application of concurrency is to divide a large computation
into work units that can be scheduled for simultaneous computation
on separate CPUs.

Distributing computations onto several CPUs is more of an art
than a science. Here are some rules of thumb.

Each work unit should take about 100μs to 1ms to compute.
If the units are too small, the administrative overhead of dividing
the problem and scheduling sub-problems might be too large.
If the units are too big, the whole computation may have to wait
for a single slow work item to finish. This slowdown can happen
for many reasons, such as scheduling, interrupts from other processes,
and unfortunate memory layout. (Note that the number of work units
is independent of the number of CPUs.)

Try to minimize the amount of data sharing.
Concurrent writes can be very costly, particularly so if goroutines
execute on separate CPUs. Sharing data for reading is often much less
of a problem.

Strive for good locality when accessing data.
If data can be kept in cache memory, data loading and storing
will be dramatically faster.
Once again, this is particularly important for writing.

The following example shows how to divide a costly computation and
distribute it on all available CPUs.
This is the code we want to optimize.

When the work units have been defined, it’s often best to
leave the scheduling to the runtime and the operating system.
However, with Go 1.* you may need to tell the runtime how many
goroutines you want executing code simultaneously.