The vast majority of the time spent in the processing of Shor's
algorithm is in the discrete Fourier transform step. In the discrete
Fourier transform we iterate from 0 to q, and for each possible
value in that range we iterate over the entire register and perform
some mathematical operations. It is trivial to divide this work among
multiple process elements. One can simply iterate on each process
element from 0 to q, and for each value in the range iterate over
some prescribed subrange of the register.

In general Shor's algorithm simulation seems a good candidate for
parallelization. The simulation can roughly be divided into three
phases: prepossessing, simulation of the quantum register, and post
processing. During the simulation of the quantum register, all the
work is done in the form of applying the same operation to an entire
array, where each array location represents one of the base states of
the quantum register. This agrees with our conception of how a
quantum register would function, as in a quantum computer, we are not
free to perform an operation on only certain portions of the
superposed state of the register, we must perform the operation on all
portions.