It depends on the programming language you are using If you are using
a language like cuda, the parallelism is expressed well and you need
no rocket science in compilation but basic good codegen

Reducing operation counts is important on GPU's as well So most of the
same high level optimizations are required

One biggest difference is low or no cache, due to which you need to be
especially mindful of not creating too many extra lifetimes.In that
sense is tradeoff point of computation vs lifetime is different from
regular processors