Backend engineer, interested in Ruby, Go, microservices, building
resilient architectures and solving challenges at scale. I coach
at Rails Girls in Amsterdam,
maintain a list of small gems
and often contribute to Open Source.
This is where I write about software development, programming languages
and everything else that interests me.

Data structures in Go: Linked lists

18 Feb 2018

Data structures and algorithms are the bread and butter of computer science.
Although sometimes they appear scary to people, most of them have a simple
explanation. Also, when explained well with a problem algorithms can be
interesting and fun to learn and apply.

This post is aimed at people that are not comfortable with linked lists, or
folks that want to see and learn how to build one with Golang. We will see
how we can implement them with Go via a (somewhat) practical example, instead of
the plain theory and code examples.

But first, let’s touch on a bit of theory.

Linked lists

Linked lists are one of the simpler data structures out there. Wikipedia’s
article on linked lists states that:

In computer science, a linked list is a linear collection of data elements, in
which linear order is not given by their physical placement in memory.
Instead, each element points to the next. It is a data structure consisting of
a group of nodes which together represent a sequence. Under the simplest form,
each node is composed of data and a reference (in other words, a link) to the
next node in the sequence.

While all this might look like too much or confusing, let’s break it down. A
linear data structure is the one where it’s elements form a sequence of some
sort. Simple as that. Now, why the physical placement in memory doesn’t matter?
Well, when you have arrays the amount of memory the array takes is fixed, in the
sense that if you have an array of 5 items, the language will grab only 5 memory
addresses in the memory, one after another. Because these addresses create a
sequence, the array knows in what memory range its values will be stored and
thus the physical placement in-memory of these values create a sequence.

With linked lists, it’s a bit different. In the definition you will notice that
“each element points to the next”, using “data and a reference (in other words,
a link) to the next node”. This means that each node of the linked list stores
two things: a value and a reference to the next node in the list. Simple as
that.

Streams of data

Everything that humans perceive around them is some sort of information or
rather data that our senses and our minds know how to process and convert it
into a useful piece of information. Whether we look, or smell, or touch we
process data and find meaning from that data. Or, when we browse our social
media networks we always resort to a feed of data, ordered chronologically and
with no end in sight.

So, how can we use linked lists to model such a news feed? Let’s first quickly
take a glance at a simple Tweet for example:

So, what does this code do? First, the main function - it creates a pointer to
a Feed struct, two Post structs with some dummy content and it invokes twice
the Append function on the Feed, which results in it having a length of 2.
We inspect the two values that the Feed has by accessing the start of the
Feed (which is, in fact, a Post) and the next item after the start,
which is the second Post.

You can see, after we add the first Post to the Feed, the length is 1 and
the first Post has a body and a publishte (as a Unix timestamp), while
it’s next value is nil (because there’s no more Posts in the Feed).
Then, we add the second Post to the Feed and when we inspect the two Posts
we see that the first one has the same content as before, but with a pointer to
the nextPost in the list. The second also has a body and a publishDate
but with no pointer to the next Post in the list. Also, the length of the
Feed increases as we add more Posts to it.

Let’s go back to the Append function now and deconstruct it so we understand
better how to work with linked lists. First, the function creates a pointer to a
Post value, with the body argument as body of the Post and the
publishDate is set to the Unix timestamp representation of the current time.

Then, we check if the length of the Feed is 0 - this means that it has no
Posts and the first one that is added should be set as the starting Post,
conveniently named start.

But, if the length of the Feed is more than 0, then our algorithm takes a
different turn. It will start with the start of the Feed and it will walk
through all the Posts until it finds one that doesn’t have a pointer to a
next one. Then, it will attach the new Post as the next value on the last
Post of the list.

Optimising Append

Imagine we have a user that scrolls through their Feed, like on any other
social network. Since posts are chronologically ordered, based on the
publishDate, the Feed will grow more and more as the user scrolls through it
and more Posts are attached to the Feed. Given the approach, we took in the
Append function, as the Feed gets longer and longer the Append function
will become more and more expensive. Why? Because we have to traverse the whole
Feed to add a Post at the end of it. If you have heard about Big-O
notation, this algorithm has an O(n) time complexity, which means that it
will always traverse the whole Feed before it adds a Post to it.

As you can imagine, this can be quite inefficient, especially if the Feed
grows to be quite long. How can we improve the Append function and decrease
the asymptotic
complexity
of it?

See, since our Feed data structure is just a list of Posts, to traverse it
we have to know the beginning of the list (called start) that’s a pointer of
type Post. Because in our example Append always adds a Post to the end of
the Feed, we could drastically improve the performance of the algorithm if
Feed knows not just of its starting element, but also about it’s ending
element. Of course, there’s always a tradeoff with optimisations, and the
tradeoff here is that the data structure will consume a bit more memory (for the
new attribute on the Feed structure).

Extending our Feed data structure is quite easy:

typeFeedstruct{lengthintstart*Postend*Post}

But our Append algorithm will have to be tweaked to work with the new
structure of a Feed. This is the version of Append using the end attribute
of Post:

If you look now at the algorithm, it does two things: if the Feed is empty it
will set the new Post as the beginning and the end of the Feed, otherwise it
will set the new Post as the end item and it will attach it to the previous
Post in the list. On top of how simple it is, this will algorithm now has a
big-O complexity of O(1), also known as “constant time”. That means that
Append will perform the same regardless of the length of the Feed structure.

Pretty simple, right? But let’s imagine that the Feed is actually a list of
Posts on our profile. Hence they are ours, we should be able to delete them. I
mean, what kind of a social network doesn’t allow it’s users to (at least)
delete their posts?

Removing a Post

As we established in the previous section, we want the users of our Feeds to
be able to delete their posts. So, how can we model that? If our Feeds were an
array, we would just remove the item and be done with it, right?

Well, this is actually where linked lists shine. When arrays sizes change, the
runtime has to capture a new memory block to store the items of the array in.
Linked lists due to their design, each item having a pointer to the next node in
the list, can be scattered throughout the memory space meaning adding/removing
nodes from the list is cheap from a space perspective. When one wants to remove
a node from a linked list only the neighbours of the removed node need to be
linked, and that is. Garbage collected languages (like Go) make this even easier
since we don’t have to worry about releasing the allocated memory - the GC will
kick in and remove all unused objects.

To make our lives a bit easier for this example, let’s put a constraint that
each of the Posts on a Feedwill have a unique publishDate.This means a
publisher can create one Post per second on their Feed. Taking that into
effect, this is how we can easily remove a Post from a Feed:

func(f*Feed)Remove(publishDateint64){iff.length==0{panic(errors.New("Feed is empty"))}varpreviousPost*PostcurrentPost:=f.startforcurrentPost.publishDate!=publishDate{ifcurrentPost.next==nil{panic(errors.New("No such Post found."))}previousPost=currentPostcurrentPost=currentPost.next}previousPost.next=currentPost.nextf.length--}

The Remove function will take a publishDate of a Post as an argument by
which it will detect what Post needs to be deleted (or unlinked). The function
is rather small. If it detects that the start item of the Feed is to be
removed it will just reassign the start of the Feed with the second Post
in the Feed. Otherwise, it jumps through each of the Posts in the Feed
until it runs into a Post that has a matching publishDate to the one passed
as the function argument. When it finds one, it will just connect the previous
and the next Post in the Feed with each other, effectively dropping the
middle (matching) one from the Feed.

There’s one edge case that we need to make sure that we cover in our Remove
function - what if the Feed doesn’t have a Post with the specified
publishDate? To keep it simple, the function checks for the absence of the
nextPost in the Feed before it jumps to it. If the next is nil the
function panics, telling us that it couldn’t find a Post with such
publishDate.

Inserting a Post

Now that we got appending and removing out of the way, let’s take a look at (a
bit of) a hypothetical case. Imagine that the source that produces the Posts
sends them to our application in a non-chronological order. This means that the
Posts need to be put in the correct order in the Feed, based on the
publishDate. This is how that would look like:

In essence, this is a very similar algorithm to the one in the Remove
function, because although both of them do a very different thing (adding v.s.
removing a Post in the Feed), they are both based on a search algorithm.
That means that both of the functions actually traverse the whole Feed,
searching for a Post that matches the publishDate with the one received in
the argument of the function. The only difference is that Insert will actually
put the new Post in the place where the dates match, while Remove will
remove the Post from the Feed.

Additionally, this means that both of these functions carry the same time
complexity, which is O(n). This means that in a worst-case scenario, the
functions will have to traverse the whole Feed to get to the item where the
new post needs to be inserted (or the removed).

What if we used arrays?

If you are asking that yourself, let me say right up front - you have a point.
True, we could store all of the Posts in an array (or a slice in Go), easily
push items onto it and also even have random access with an O(1) complexity.

Due to the nature of arrays, whose values have to be stored in memory one right
after another, reading is really fast and cheap. Once you have something stored
in an array, retrieving it as easy as accessing it by its 0-based index. When it
comes to inserting an item, whether in the middle or at the end, then arrays
become less efficient compared to lists. That is because if the array doesn’t
have more memory reserved for the new item(s), it will have to reserve it and
use it. But, if the next memory addresses are not free, it will have to “move”
to a new memory address where there would be space for all of the items in it
(new and old).

Looking at all of the examples and discussion we had so far, we can create a
table with time complexity for each of the algorithms we created, and compare
them with the same algorithms for arrays:

As you can see, when you are faced with a certain problem, picking the correct
data structure can really make or break the products you create. For
ever-growing Feeds, where insertion of Posts is paramount, linked lists do a
much better job because insertions are really cheap. But, if we had a different
problem on our hands that requires frequent deletions or lots of
retrieval/searching, then we would have to pick the correct data structure for
the problem we are dealing with.

You can see the whole implementation of the Feed and play with it
here. Also, Go has it’s own linked list
implementation, with some nice functions already built in. You can see it’s
documentation here.

EDIT 20.02.2018 10:00 CET: Previous verion of the article wrongly stated that
the big-O complexity of the delete function is O(1). Changed to O(n) after
pointed out by Bart de Goede.

EDIT 22.02.2018 23:00 CET: Previous verion of the article wrongly stated that
the big-O complexity of searching in an array is O(1) instead of O(N). This
was a mixup of Access and Search on my side. This is fixed by adding
separate rows in the table for Search and Access and their respective time
complexities.

EDIT 25.02.2018 17:00 CET: Previous version of the article had a buggy
implementation of the Insert function, which was pointed out by Albert Shirima
in the comments.

If you have ever tried writing a daemon for MacOS you have met with launchd. For
those that don’t have the experience, think of it as a framework for starting,
stopping and managing daemons, applications, processes, and scripts. If you have
any *nix experience the word daemon should not be too alien to you.

Recently, while writing a small Golang program for setting reminders I came
across a small confusion that I guess most newcomers to Golang will have - how
to organise a package in a way that will enable it to cleanly contain two or
more binaries.