I'm trying to research whether GPU demand paging is possible with the nowadays hardware. Say, I'd like to store the demanded data on DRAM or another VRAM; The accesses to the local VRAM (by VA) are intercepted and the remote data should be offered when necessary. Here I want to use arbitrary language, e.g. OpenCL.

IOMMU seems to only intercept DMA between IO device and host DRAM. GART translates only the memory address inside the graphics aperture, and it's PA-to-PA translation. So is there any workaround?