In this paper, we propose an implementation method with high throughput for a single-chip 4096 complex point FFT. In order to increase transform speed, a parallel FFT architecture has been used. There are eight parallel basic processing modules in the entire FFT chip, which can work at the same time independently. The proposed architecture can compute 4096 complex point forward or inverse FFT in real time with up to 320 MHz sampling frequency, and applied widely in high-speed signal processing.