I'm hoping you guys can give me some feedback on a project I'm working on.

So this is the worst-case: I'm trying to find/build a 128-channel high-speed (~1MHz) digital (5V 1's and 0's) pattern generator and I'm exploring possible solutions. The basic requirement is that I need to output 128 50/50 square-waves who's phases can be adjusted by small amounts. For example, the worst case the phase should be adjusted by 1/128 the period (1/1MHz/128 = 7.9us). I believe this means the clock rate of the system needs to extend to 128MHz.

So I see there being two basic challenges 1) How do I get 128-channel (i.e. 128-bit) parallel output and 2) How do I get the 1MHz clock rate with adjustable phase?

This seems beyond the capabilities of the standard Arduino but I'm trying to understand if a serial-to-parallel approach with a microcontroller would work. For example, if I took the 3 hardware timers on the arduino and connected shift registers to them, then each one could control ~43 channels at a max (ideal, but not feasible) frequency of 16MHz/43 = 372kHz. If I wanted to make the phase adjustable by 128th the period, then I think this would lower the max signal frequency to 372kHz/128 = 2.9KHz. So is this the right way to think about this or am I missing something?

Also what about other hardware solutions? The Maple Leaf clocks at 72MHz and has 4 hardware timers so that would mean 128channels/4 = 32 channels/timer, 72MHz/32 = 2.25MHz frequency reduced to 2.25MHz/128 = 17.6kHz with adjustable phase. Or what about the Beaglebone that tops out at 1GHz?

I had a similar question recently about generating signals from a high speed MCU where I assumed since the Arduino can generate 4Mhz square waves at 16Mhz that any MCU could generate signals at 4 instructions per. It turns out not to be true at all. 14Mhz tops on a 700Mhz Raspberry Pi:

128 channels? I am thinking FPGA too in this case, though I know not much about them other than the better ones are fast and have a lot of pins. Maybe you can have the MCU program the pattern into RAM organized as 128 bits wide and have the FPGA fetch it 128 bits at a time and bang it out over the 128 output lines. Probably you could run that a lot faster than 1Mhz and very accurately. There is a lot of 10ns SRAM available and lots of FPGAs that can run at 100Mhz or better. Have fun soldering a high-end BGA FPGA.