Wednesday, December 02, 2009

All I Want For XMas Is Parallel Command Dispatch

This is an open letter to Santa, and by Santa I mean the people at ATI and nVidia who are coming up with the next generation of GPUs, which will be even more amazing than the current GPUs, which really aren't too shabby.

I was reading computer graphics research papers yesterday - we're in a real period of creativity with new algorithms. The GPU has reached a level of flexibility that makes amazing algorithms possible. Real-time ambient occlusion...real time radiosity...unthinkable just a few years ago!

But I've noticed a theme with all of these new algorithms: many of them render to texture as a preparation step. It's a simple idea: rendering to a texture builds a 2-d spatial index that we can use to reduce the overall performance of an algorithm. You see this with reflections and environment mapping, shadow mapping, deferred rendering...shadow maps are a staple food of next-gen lighting algorithms.

All aspects of the GPU have been getting faster...except for control protocols. That is, when it comes time to change what the GPU is doing, our improvements in performance come from the tepid improvements in single core throughput, not the rapid improvement in CPU core count or GPU shader count. I have 8 cores in my machine, but I can't put out batches 8x as fast.

So here it goes.

Dear Santa,

I have tried to be very good this year. I have done all of the dishes, and for the most part I have avoided provoking the dog into chewing on the cat's head. I have also eaten a lot of green vegetables.

Could I please have a GPU and driver that can render separate command streams to separate frame buffers from separate CPU cores? With that, I could prepare 8x the number of shadow maps, environment maps, and other input textures for my algorithms.

Thank you!

PS, if that is asking too much, a small airplane would also suffice. :-)