RSA Algorithm

Hi All,
I am trying to implement a 1024 bit RSA in a Application Specific Processor. Since the processor has 32 bit register arrays, only 32 bit HEX input is handled per cycle. So until now, this processor can do 32 bit encryption which is of no use. So i need some techniques such that I can do 1024 bit encryption in this processor.
The idea is that there are 32 x 32 bit registers in the processor. Only two 32 bits of input is supported by the RSA block. Some technique (read so many articles and papers for this, for example Carry Save Adders etc but did not understand it) should be applied for the Montgomery Reduction and exponentiation of the data using two input registers. The final result should be out from 32 bit output register.

You need a big number library ("arbitrary precision" is sometimes used too). They're common enough you may find one already available for your processor (or generic C source that you can tweak). If not they aren't that difficult to write on your own. You should be able to find plenty of resources on the internet though the basic idea is just to perform the computations digit-by-digit like you wold in grade school (only using base-2^N digits).

Montgomery reduction/multiplication/exponentiation is just an optimization you can apply when performing many multiply-modulo-n operations. You'll still need to have the ordinary N-bit add/subtract/multiply/divide to use it so you can get those working first, then learn about Montgomery when you have the basic big number support.