Fast Hadamard transform (FHT) belongs to the family of discrete orthogonal transforms and is used widely in image and signal processing applications. In this paper, a parameterizable and scalable architecture for FHT with time and area complexities of O(2(W+1)) and O(2N2), respectively, has been proposed, where W and N are the word and vector lengths. A novel algorithmic transformation for the FHT based on sparse matrix factorization and distributed arithmetic (DA) principles has been presented. The architecture has been parallelized and pipelined in order to achieve high throughput rates. Efficient and optimized field-programmable gate array implementation of the proposed architecture that yield excellent performance metrics has been analyzed in detail. Additionally, a functional level power analysis and modeling methodology has been proposed to characterize the various power and energy metrics of the cores in terms of system parameters and design variables. The mathematical models that have been derived provide quick presilicon estimate of power and energy measures, allowing intelligent tradeoffs when incorporating the developed cores as subblocks in hardware-based image and video processing systems