This paper presents a novel architecture for matrix
inversion by generalizing the QR decomposition-based recursive
least square (RLS) algorithm. The use of Squared Givens
rotations and a folded systolic array makes this architecture very
suitable for FPGA implementation. Input is a 4 Ã 4 matrix of
complex, floating point values. The matrix inversion design can
achieve throughput of 0.13M updates per second on a state
of the art Xilinx Virtex4 FPGA running at 115 MHz. Due
to the modular partitioning and interfacing between multiple
Boundary and Internal processing units, this architecture is easily
extendable for other matrix sizes.