GCC, which is still the base compiler for OpenBSD on all architectures except i386 and amd64, had all this before: see tree-emutls.c for the compiler pass and emutls.c for the runtime support. LLVM added compatible support a couple of years ago.

So this all works fine for anything compiled with Clang or GCC. But Go has its own toolchain. I don't know how Go uses TLS. What support, if any, does Go need from the system linker or kernel for TLS?

I wondered if commit 9417c02 has anything to do with this. That commit removes some code that previously allocated memory for TLS data on OpenBSD. It was made because OpenBSD 6.0 added support for PT_TLS sections in ld.so - see tib.c. But that just means that the necessary memory will be allocated by the dynamic linker: I'm not sure that it will work if the binary is statically linked.

I tried building the Go toolchain from that commit's parent and testing with that. However, if I try to statically link with that it fails at link time due to multiple definitions of pthread_create: the wrapper in runtime/cgo/gcc_openbsd_amd64.c clashes with the definition in the system pthread library.

So I guess static linking of Go binaries has never worked on OpenBSD.

This comment has been minimized.

@cac04 Thanks for looking into this. The Go runtime basically uses a single TLS variable, which holds a pointer to the currently running goroutine. When using external linking, which happens by default when using cgo code, the Go linker produces an object file that is passed to the system linker. So we need to arrange for the Go linker to generate the code and relocations that the system linker expects to see.

For amd64 this process starts in cmd/compile/internal/amd64/ssa.go. Look for OpAMD64LoweredGetG, which is the compiler internal pseudo-instruction that fetches the current goroutine pointer. From what you say above it sounds like that code needs to change to generate a call to an emultls function. But if the emutls function can change any registers, which I assume it can, then simply changing amd64/ssa.go will not be enough; we need to teach the compiler in general about emutls, and teach it to generate function calls for GetG.

This comment has been minimized.

edited

Clang also uses Emulated TLS on Android. But Go, including cgo, works fine on Android, doesn't it?

Go, including cgo, seems to work fine on OpenBSD as long as I don't pass -static to the external linker.

So clearly even if Android and OpenBSD don't support thread-local storage properly, whatever 'properly' means, they support it enough for Go to work without using the Emulated TLS functions.

Thanks for the pointer to OpAMD64LoweredGetG: I think I see how this works now. Apologies for teaching grandmother to suck eggs in what follows but this is all new to me so I'm writing out my reasoning in full.

The TLS blocks for the executable itself and all the modules loaded at startup
are located just below the address the thread pointer points to. This allows
compilers to emit code which directly accesses this memory.

Second, the Thread Control Block (TCB) contains (at some OS-dependent location) a pointer to a dynamic thread vector that contains pointers to TLS blocks allocated for dynamically-loaded modules (which in this context means modules loaded after program startup, e.g. via dlopen(3), or lazily loaded modules).

There are four access models for TLS variables, which can be split into two types: 'Dynamic' and 'Exec'. The 'Dynamic' models allow access to TLS blocks reached via the dynamic thread vector, while the 'Exec' models only allow access to TLS blocks allocated at program startup. The 'Local Exec' model further restricts access to only those TLS variables defined in the executable itself.

The advantage of the 'Local Exec' model is that we know the variables must be in the first TLS block, which is at a fixed (although architecture dependent) offset from the thread pointer. Since the program linker knows the variable's offset within the TLS block, we know the variable's offset from the thread pointer at link-time.

So, if we use the 'Local Exec' model then we only need support for TLS relocations in the program linker. The dynamic linker doesn't need to do anything.

If this is how Go uses TLS, then Go requires only the following support for TLS:

An offset from the thread-local storage base is written off(reg)(TLS*1).
Semantically it is off(reg), but the (TLS*1) annotation marks this as
indexing from the loaded TLS base. This emits a relocation so that if the
linker needs to adjust the offset, it can.

So we do indeed end up with just a R_x_TPOFF32 relocation, which can be handled by the program linker without needing any support in the dynamic linker.

This is all good because OpenBSD uses the GNU binutils ld as the program linker, which can handle R_x_TPOFF32 relocations, but OpenBSD's dynamic linker does not support any TLS relocations at all.

The only other thing we require is that the static TLS blocks are allocated at program startup.

If I understand tib.c and, specifically, line 174 of library.c in OpenBSD's ld.so correctly, OpenBSD does not provide a dynamic thread vector. But it does allocate static TLS blocks on program start-up.

So OpenBSD does not support TLS 'properly' because:

The dynamic linker does not support TLS relocations

There is no support for a dynamic thread vector of TLS blocks

Static TLS blocks are allocated by ld.so only, not when a statically-linked non-PIE program starts

Edited 2017-11-02: Removed incorrect claim that OpenBSD's pthread_create was lacking support for TLS block allocation. That's wrong: OpenBSD 6.0 added TLS support in both ld.so and pthread_create... which makes sense, as you need to copy the master copy of the TLS blocks whenever a new thread is spawned.

It's the third point that means Go works fine when dynamically linked but not when statically linked.

Before OpenBSD 6.0, ld.so did not support TLS at all. So Go used to provide a wrapper for pthread_create in runtime/cgo/gcc_openbsd_amd64.c that allocated memory for a TLS block. That wrapper was removed in commit 9417c02.

The wrapper doesn't help with statically linking anyway, as you just get a symbol clash with the pthread_create provided by the system.

Finally, Android's linker doesn't seem to support TLS relocations either, which is presumably why Clang enables Emulated TLS for Android too. There isn't even any support for the link-time R_x_TPOFF32 relocation used for 'Local Exec'. I guess this is why the offset is hard-coded in runtime/cgo/gcc_android_amd64.c.

This comment has been minimized.

edited

I apologize for spamming this issue with long, rambling comments. I thought it might be interesting for others to follow my chain of reasoning.

I believe I have now got to the bottom of this issue. I can replicate the problem without involving Go at all (i.e. with just C and assembly source files).

Edited 2017-11-03: Actually, I hadn't got to the bottom of it. Edited to remove an incorrect conclusion.

Here are the details:

As described in the previous comment, OpenBSD does support 'Local Exec' mode TLS when ld.so is invoked. So for dynamically linked executables, everything is fine. ld.so takes care of allocating and initializing the TLS block.

This function contains code for setting up a TLS block, see lines 104 and 152, but it is wrapped in:

#ifndef PIC

When libc is built as a shared library, it is built with -DPIC so this code is skipped. But when it is built as a static library, PIC is not defined so this code is included.

So, when we are dynamically linking, this code is not included and ld.so takes care of setting up TLS; but when we are statically linking, this code is included and it takes care of setting up TLS. It would appear that TLS should work whichever way we link.

However, TLS does not work with a statically-linked non-PIE executable:

This comment has been minimized.

It might not take any work to support -buildmode=pie on OpenBSD, since presumably it works the same as on any ELF system. It would be worth tweaking BuildModeInit in cmd/go/internal/work/build.go and (*BuildMode).Set in cmd/link/internal/ld/config.go and (*tester).supportBuildMode in cmd/dist/test.go to permit "pie" on OpenBSD, and see what happens.

This comment has been minimized.

Meanwhile, I've realized that I was wrong again. It is true that -static -nopie breaks TLS on OpenBSD but not for the reason that I thought. I was misled by OpenBSD's fancy anti-ROP mechanism in libc: the .a file that I inspected was not the static libc but rather an archive of shared objects. If I inspect the correct libc.a I can see that it does initialize the static TLS block and this code is called whenever we link statically, regardless of whether or not we build with -nopie.

Since I can recreate this problem without Go, I'll post to an OpenBSD mailing list and see if someone more knowledgeable than me can help.

This comment has been minimized.

Success at last! I received some help on the OpenBSD tech@ mailing list. In short: the current release of OpenBSD does not initialize TLS blocks when the executable is both statically-linked and not position-independent. It is a limitation of OpenBSD 6.2.

However, I was given a patch to fix this and it appears to work. With the patch, .tbss sections work correctly in statically-linked non-PIE binaries. (There's a bit more work to do to get .tdata sections working but I don't think Go needs them.)