Until now, the performance improvement of computing machines was a mostly a result of shrinking transistor geometries and increasing clock speeds. With the advent of signal processing applications that have stringent performance requirements from processing hardware, the field of configurable computing has received a lot of attention. Efforts are being made to improve computation bandwidth by architectural innovations. Among these, the wormhole runtime reconfigurable architecture introduces the concept of stream processing. It enables dynamic reconfiguration of hardware with little overheads and is very much suited for data-path based computations with deep computational pipelines. Stallion, second in the generation of Wormhole runtime reconfigurable processors, demonstrates the efficacy of wormhole runtime reconfiguration. The work presented here deals with the VLSI implementation of Stallion and discusses the full-custom physical design flow adopted for Stallion. Also, the tools and techniques to customize this flow are detailed. The Stallion design methodology offers a possible solution that can be pursued for executing similar efforts in future.