riscv: add support to block smp in each stage

Each stage performs some basic initialization (stack, HLS etc) and then call smp_pause to enter the single-threaded state. The main work of each stage is executed in a single-threaded state, and the multi-threaded state is restored by call smp_resume while booting the next stage.