The recommendation is to favor returning concrete types instead of interfaces
whenever possible.

Stack over heap

Unlike some popular languages, Go permits using the stack extensively and the
runtime doesn’t enforce a predefined stack size limit. Using the stack for short
lived objects reduces the allocation of small objects on the heap.

Edit: As Roger
Peppe noted, the
compiler may silently optimize trivial statements like b := make([]byte, 16)
into b := [16]byte{} so it is not recommended to do a wholesale replacement
unless visible improvements can be measured.

Sometimes, it can be as simple as:

f:=&foo{...}// vs
f:=foo{...}

In doubt, profile as explained at the end of this post.

Containment

In some case, a function needs a temporary object or a slice that has definite
lifetime but unknown size. Or it could be that the escape analysis determines
that the object needs to be on the heap.

See further reading below for more information about escape analysis.

In these cases, it can be useful to put a cache of the object in a large
object, especially when the function is a hot spot. This can be as simple as:

This optimization is great for for functions called often but not
concurrently. Preallocate a common array size and fallback to heap allocation
only when necessary. A great example in the standard library is
bytes.Buffer.bootstrap.

For functions that are called concurrently,
sync.Pool can be leveraged to attain
similar heap reduction improvements.

Improvement in practice

Here’s a real world example of these optimizations.

To make an I/O operation on the SPI bus on a Raspberry PI, the sysfs SPI code in
periph.io had to allocate temporary objects when converting
a Tx() call into the underlying expected format. This is an important part as
I/O can be done at a fairly high rate, so I decided to take a deeper look.

Preallocate an array of 4 spiIOCTransfer instead of allocating one on the
heap, contained inside SPI.

In txInternal(), use an item from this array instead of trying to allocate
on the stack, which the compiler promoted to the heap anyway, to effectively
get around the escape analysis issue in Ioctl() call.

Get the most recent version of pprof.
While a snapshot of the tool is available in the toolchain as go tool
pprof, there’s no reason for not using the latest version:

go get -u github.com/google/pprof

Write a micro benchmark covering the code you want to optimize like this
one.
Beware of what the microbenchmark tests for. The Go’s optimizing compiler can
optimize your code paths if they are too trivial, and optimization will
differ across CPU architecture (ARM vs x86).

Build it. This is to enable source annotation: go test -c

Run your newly compiled executable. Using a short benchtime accelerates the
process:

You got the culprit! Don’t be fooled by it’s innocuous look, it’s a heap
allocation! This is because of the compiler escape analysis of Ioctl() at
line 223, the compiler doesn’t know what the function will do with the unsafe
pointer.

Now try to find a way to make the code not allocate, then iterate until you
zapped all allocations you found that are worth fixing.

One word of caution

Before optimizing a loop, make sure it’s actually a hot code path. This can be
asserted via profiling,
provided by the runtime/pprof package.
Also beware of microbenchmarks, which can occasionally lie and turn you
into a rabbit hole optimizing unimportant things.

If you care about performance on a Raspberry Pi or on ARM in general, run the
benchmarks there. The Go compiler optimizations and CPU performance
characteristics are wildly different.

Still, putting objects on the stack is generally a safe bet.

More reading

I hope you enjoyed this post! Here’s further reading by excellent people: