Ok, re GPU blocks/processors, the following can all be assumed to be in there:

Command Processor and Thread Scheduler (not necessarily the same block)

Trisetup and rasterizer (R800 dropped that and delegated the workload to SPs)

Global Data Share (traditionally not very large, and likely encased nicely by some of the numerous embedded pools, in a much larger size)

A bunch of caches (vertex, texture) which could be really tiny or not so much (again, memory pools ahoy)

DMA engines

Ring buses

Tessellator (likely still sitting in fixed-function silicon)

BTW, a quick google for ARM9 die area got me to this article discussing a Qualcomm broadband/app processor (yes, ARM9 stand-alone dies are a bit hard to track these days), which appears to be ~0.8mm2 @40nm (the original part is 90nm, so I've applied a squares rule).