Go has two tools that are invaluable in performance tuning: a profiler and a benchmarking tool. The profiler helps find the trouble spots and benchmarks show the results of an optimization. See How to write benchmarks in Go by Dave Cheney and Profiling Go Programs by Russ Cox for introductions to these tools. Below are several specific techniques I found with benchmarks and the profiler. Source code for the benchmarks is on Github.

Every allocation of memory has several potential costs. The Go runtime must ensure that the memory is initialized to the zero value. The garbage collector must track the references to value and eventually clean it up. Additional memory usage also makes it less likely to get a CPU cache hit.

This simple example fills a slice of up to 1024 bytes with 1s.

funcBenchmarkNewBuffers(b*testing.B){fori:=0;i<b.N;i++{n:=rand.Intn(1024)buf:=make([]byte,n)// Do something with bufferforj:=0;j<n;j++{buf[j]=1}}}funcBenchmarkReuseBuffers(b*testing.B){sharedBuf:=make([]byte,1024)fori:=0;i<b.N;i++{n:=rand.Intn(1024)buf:=sharedBuf[0:n]// Do something with bufferforj:=0;j<n;j++{buf[j]=1}}}

Allocating a new buffer each iteration is substantially slower. Obviously, the more work done on the buffer the less relative impact eliminating the allocation would have. Surprisingly, both versions show 0 allocs/op. How can that be? Let's rerun the test with the -gcflags=-m option to ask Go to tell us the details.

PostgreSQL allows the transmission of data in binary or text format. The performance of the binary format is far faster than the text format. This is because the only processing typically needed is converting from network byte order. The binary format should also be more efficient for the PostgreSQL server and it may be a more compact transmission format. However, we will isolate our benchmarks to the parsing of int32 and time.Time values.

Parsing an int32 takes over 18x longer than to parse from text than simply to read in binary. Parsing a time takes over 84x longer. The absolute numbers are small, but they add up. In general, binary protocols are vastly faster than text protocols.

When reading or writing a binary stream using binary.Read with an io.Reader or binary.Write with an io.Writer is very convenient. But working directly with a []byte and binary.BigEndian.Get* or binary.BigEndian.Put* is more efficient.

Let me close with another warning to measure before committing optimizations. One use case I wanted to optimize was that of a web API that served JSON produced directly in PostgreSQL. The normal way to do this is to read the JSON into a string then write that string to the HTTP io.Writer. But wouldn't it be so much faster to copy directly from the PostgreSQL io.Reader to the HTTP io.Writer? It's obvious it should be faster, but unfortunately it is incorrect. Benchmarks revealed it was actually slower in the vast majority of cases, and only marginally faster in the best cases.