TLK runs only on CPU0 so we have to switch CPUs before we
issue any request to the secure world. There are instances
when the requests are sent from workqueues which need some
extra code before we can run on CPU0. Previously, we used
this code only for resizing VPR regions. But it seems that
the requests for TAs can also benefit from this approach.

Encapsulate the logic in a common function, send_smc(), and
remove tlk_generic_smc(), tlk_extended_smc() functions. For
non-PF_NO_SETAFFINITY scnearios, check we can switch the
CPU mask to run on CPU0. If for some reason the CPU switch
fails, then we schedule work on CPU0 instead. This takes care
of the previous corner cases when the CPU switch failed and
we continued on the same CPU.