Put FLS in the Coroutine object instead of with the thread in a vector.
This technically means we are no longer calling destructors on
outstanding keys when the keys are deleted, but that was already the
case with multiple threads. The up side is that no one really deletes
TLS keys unless the program is terminating anyway.