Posted
by
Soulskill
on Friday August 12, 2011 @03:20PM
from the building-a-better-virtual-rocket-launcher dept.

Vigile writes "John Carmack sat down for an interview during Quakecon 2011 to talk about the future of technology for gaming. He shared his thoughts on the GPU hardware race (hardware doesn't matter but drivers are really important), integrated graphics solutions on Sandy Bridge and Llano (with a future of shared address spaces they may outperform discrete GPUs) and of course some thoughts on 'infinite detail' engines (uninspired content viewed at the molecular level is still uninspired content). Carmack does mention a new-found interest in ray tracing, and how it will 'eventually win' the battle for rendering in the long run."

It should be noted that John Carmack believes that Ray Casting, not Ray Tracing, will win out in the long term.

Unfortunately, many people outside of the graphics field confuse the two. Ray Casting is a subset of Ray Tracing in which only a single sample is taken per pixel, or in other words, in which a single ray is cast per pixel into the scene and a single intersection is taken with the geometry data set (or in the case of a translucent surface, the ray my be propagated in the same direction a finite number of times until an opaque surface is found). No recursive bouncing of rays is done. Lighting is handled through another means, such as with traditional forward shading against a dataset of light sources, or using a separate deferred shading pass to un-hinge the combinatorial explosion of overhead caused by scaling up the number of lights in a scene.

John Carmack has been quoted on saying that full-blown ray-tracing just isn't feasible for real-time graphics due to the poor memory access patterns involved, as casting multiple rays per pixel, with multiple recursive steps ends up touching a lot of memory in your geometry data set, which just thrashes the cache on CPU and modern/future GPU hardware alike.

When people talk about real-time ray-tracing, they almost always invariably are referring to real-time ray-casting.

I admit, I made my post before watching the video. Now that I've watched it, I think my position still holds. He talked a lot about using ray-tracing in content preproduction and preprocessing tools during development, and he talked about building ray-tracing engines in the past to experiment with the performance characteristics. This is the same research he did where he came to the conclusion that ray-casting will be better than ray-tracing. Where his position has changed is in that he used to think it wou

I'd hope he is talking about ray tracing - I'm basically doing ray tracing inside of polygons in shaders today, and I'm sure he has, too. I really don't think ray casting has enough advantages over polygons that would make it worth it, and has significant disadvantages that would need to be worked around (no reflections, shadows, etc). Back in the 1980/1990s, programmers used the painter's algorithm instead of zbuffer because it was significantly slower to use zbuffer up to a certain number of polygons, and

At 13:30 he seems to be talking about path-tracing. That accumulation step that he describes with an arbitrary cut-off to make the frame deadline and the random jitter makes it quite certain. The random jitter in time reusing previous pixel results somes like a very bizarre and trippy form of "motion" blur - it would look similar but blurring would be proportional to path length casting from that pixel. I might have to hack that up and try it...

It's possible to make a low-res realtime version of photon mapping. See https://www.youtube.com/watch?v=GckOkpeJ3BY [youtube.com] for an example. It isn't as good as proper photon mapping, but it *does* give indirect lighting.

Raytracing is extremely straight-forward and parallel. The only thing you have to do to make it feasible for use in games is either A. throw more power at it, B. cheat where you can, C. combination of A&B.

The video you posted is not real-time frame rates, it's interactive frame rates. It takes a few seconds to fully recompute the scene once you stop moving the camera. And note how there's only a single car model. imagine scaling up the amount of geometry to a full world. With ray-tracing, as you scale up the complexity of the geometry, you end up scaling up the required computational complexity as well due to radiosity computations. Full real-time ray-tracing on huge worlds in real-time is a pipe-dream. What you will be able to do with ray-casting or rasterization with deferred shading composition to simulate things like reflections or radiosity will always be more than what you can do with ray-tracing, and so games developers will always choose the former.

I think part of the problem is that you get a bunch of CS types who learned how to make a ray tracer in a class (because it is pretty easy and pretty cool) and also learn it is "O(log n)" but don't really understand what that means or what it applies to.

Yes in theory, a ray tracing engine scales logarithmically with number of polygons. That means that past a point you can do more and more complex geometry without a lot of additional cost. However that forgets a big problem in the real world: Memory access. Memory access isn't free and it turns out to be a big issue for complex scenes in a ray tracer. You have to understand that algorithm speeds can't be taken out of context of a real system. You have to account for system overhead. Sometimes it can me a theoretically less optimal algorithm is better.

Then there's the problem of shadowing/shading you pointed out. In a pure ray tracer, everything has that unnatural shiny/bright look. This is because you trace rays from the screen back to the light source. Works fine for direct illumination but the real world has lots of indirect illumination that gives the richness of shadows we see. For that you need something else like radiosity or photon mapping, and that has different costs.

Finally there's the big issue of resolution that ray tracing types like to ignore. Ray tracing doesn't scale well with resolution. It scales O(n) in terms of pixels, and of course pixels grow in a power of two fashion since you increase horizontal and vertical resolution when you get higher PPI. Then if you want anti-aliasing, you have to do multiple rays per pixel. This is why when you see ray tracing demos they love to have all kinds of smooth spheres, but yet run at a low resolution. They can handle the polygons, but ask them to do 1920x1080 with 4xAA and they are fucked.

Now none of this is to say that ray tracing will be something we never want to use. But it has some real issues that people seem to like to gloss over, issues that are the reason it isn't being used for realtime engines.

Yeah, not to mention that with full ray-tracing, as you add more lights to a scene, it increases the overall complexity per pixel per ray bounce linearly as well. That's why doing deferred shading is so nice because it unbuckles the lighting from the rasterization or ray-casting/tracing step and lets you scale up the number of lightings independently with a fixed amount of overhead and linear cost per light for the entire scene, instead of per pixel or per ray or per polygon.

Yeah, not to mention that with full ray-tracing, as you add more lights to a scene, it increases the overall complexity per pixel per ray bounce linearly as well. That's why doing deferred shading is so nice because it unbuckles the lighting from the rasterization or ray-casting/tracing step and lets you scale up the number of lightings independently with a fixed amount of overhead and linear cost per light for the entire scene, instead of per pixel or per ray or per polygon.

I highlighted the key thing that you overlooked. They are both linear to the number of lights.

With deferred shading, lighting is done in a second pass (or multiple passes for different types of lights) directly against the g-buffer. The g-buffer contains a surface normal, depth value, and lighting & material coefficients for each pixel and is populated by rasterizing or ray-casting/tracing your scene geometry into it. With rasterization, each of your polygonal meshes are rendered into the g-buffer once without regard for lighting, during the lighting stage, each light is applied to t

It gets even worse when you realize you have to do subsurface scattering to get realistic looks for a lot of surfaces (like, oh, skin). Then you no longer can terminate a photon when you reach most surfaces, but then have to further reflect and refract photons from that point.

It does make for nice looking materials, though... without it, you get those iconic hard and shiny surfaces in ray traced images, like the famous metal balls.

Then there's the problem of shadowing/shading you pointed out. In a pure ray tracer, everything has that unnatural shiny/bright look. This is because you trace rays from the screen back to the light source. Works fine for direct illumination but the real world has lots of indirect illumination that gives the richness of shadows we see. For that you need something else like radiosity or photon mapping, and that has different costs.

You basically have it right. You trace a ray for each pixel, starting at the monitor, and it bounces back to the light source. To deal properly with indirect illumination and thus get good shadows and so on, you'd have to do it the other way, you'd have to trace rays from the light source, bounce them off materials, and see where they end up. That, of course, gets real CPu intensive since you can be tracing rays that never intersect with the display.

I won't go into what constitutes realtime vs interactive, as I can make even the fastest game engine out there 'interactive' as long as I run it on low enough hardware with all the features enabled and running it across multiple HD screens.

But the converse is also exactly my point - raytracing is something that scales very well, you just throw more computational power at it.

I did also mention 'cheats'; in that a game engine doesn't necessarily rely on any one single technology to begin with. We've now got

I also forgot to mention that conventional rasterization and ray-casting is also just as parallel in nature as ray-tracing. In fact, more so because it has much better memory access patterns as I mentioned in a previous post. Memory-access is the biggest limiting factor in building scalable, parallel systems. If you don't have good memory-access patterns, you might as well being doing sequential work, because it's getting serialized by the hardware memory controller anyway.

The data access problem of backward rendering is unsolvable... it will always access data without regard to object coherency. For primary and shadow rays forward renderers will always be able to be more efficient when efficient occlusion culling is possible and subsampling isn't needed.

The video you linked has a realtime preview... in 1/60th second it probably doesn't get that much further than the raycasting solution (primary rays).

The only thing that makes it "realtime" is that it has a relatively high redraw rate (for a raytracer). That's why there is a lot of fuzz when the camera pans around. It might render only a few thousand rays between redraws, which does give fast feedback but also slows down the rendering process overall. Most raytracers will churn 100k rays before updating the preview.

This is analogous to progressive jpeg decoding, where you start with a very chunky low-res preview and gradually work your way up to the f

If only.... In the real world that would be an exclusive or, so pick one of the two. On small uncomplicated scenes it is straight-forward to make the tracing of each ray happen in parallel. As you add more geometry though it begins to behave differently. You will need a giant cache of rays to handle all of the bounces otherwise all of the recomputation will kill your performance. This was well known in the graphics community prior to GPUs and the same sc

Real time in it's common technical meaning is mostly nonsensical when talking about rendering, it would mean you could guarantee a maximum delivery time... real time rendering most often used as synonymous to interactive rendering (with interactive being a fuzzy concept, but lets say >10 fps in honor of Carmack's famous turtle in Quake).

Unfortunately, many non-programmers such as yourself dont understand algorithmic complexity and as such fail to realize that O(P log N) will eventually beat O(PN) once N is large enough, even though the constants in the first are much larger than the constants in the second.

Carmack knows that raytracing will eventually be superior in performance to rasterization because it is inevitable.

The thing is that when the critical N is reached, O(P log N) isnt just going to be slightly better, its going to be en

I can't tell if you're trolling or you just wrote up a reply without fully reading my post. I never said rasterization is better. I alluded to the fact that ray-casting will win out over rasterization, and in the very near future. What I said was the ray-casting will win out over ray-tracing. The algorithmic complexity of ray-casting + deferred-shading is better than recursive ray-tracing.

And if you include lighting into the equation, with N = L lights + S geometric surfaces, he's advocating for O(c * 2^backtraces * N) over O([c1 * S] + [c2 * L]) that you would get with ray-casting + deferred shading&lighting, where it could be proven that c2 c1 c, seeing as how you also have to cast multiple rays per recursive back-trace to get a decent approximation of lighting in your scene, where you only need to perform a single sample per pixel with what I'm advocating for.

Actually, now that I think of it, that should be O(c * 2^backtraces * log N) vs O([c1 * log S] + [c2 * L]) given that you would normally use spatial data structure such as an octree with logarithmic look-up times.

Ahh, okay, gotcha, now what you original said makes sense. There's more than one way to scale it. Generally, on a given project you have fixed budgets for pixels and polygons, but over the years as you target each subsequent generation you get to scale up both categories--however, scaling up the number of pixels beyond 1920x1080 or 1920x1200 doesn't really make as much as much sense as say continuing to scale up the complexity of your geometry, so I guess my brain assumed that's what you'd want to scale up

Everything you wrote is mathematically accurate, yet the actually interesting thing is exactly how big N has to get. The mathematics of big-O notation tells you nothing about that, yet that is the crux of the matter. For example, if N has to be bigger than 10^123478234897298, the apparent better asymptotic complexity has no impact on the real world.
The point is that if I tell you that one algorithm is O(1) and the other is O(2^n), you haven't actually learned anything useful about which algorithm you shou

Everything you wrote is mathematically accurate, yet the actually interesting thing is exactly how big N has to get.

You know that this very question has been researched, right? I am amazed that you are intent to discuss this issue without having actually done any research in this matter.

What I'm discussing is what can and cannot be concluded from big-O asymptotic complexity. You were drawing a mathematically correct conclusion that "for big enough N, raytraycing is better." You then made an incorrect further conclusion that "eventually, raytraycing is better." Big-O notation never guarantees that you'll ever be able to solve an input so big that the complexity estimate becomes accurate as to which algorithm is better. You then chose to heed my advice and present data on what actually matte

My conclusion was not incorrect as you claimed, and in fact I gave a citation to prove otherwise, and after reading that proof you decided my conclusion was incorrect on "general theoretical grounds" even though the paper proves it "in actual practice."

The mistake is clearly yours in thinking that the metrics involved havent been well researched. You go on about the theoretical.. I'm sure that will impress some other guy that doesnt actually do shit.

That is a misunderstanding - the objection is not that the conclusion is wrong, it's that your argument for it was wrong. Your original argument is not helped by coming up with a different argument that shows the same thing. I told you to use practical evidence instead of incorrectly applying theory, and then when you did that, you think that that shows that what I said was wrong.

John Carmack has been quoted on saying that full-blown ray-tracing just isn't feasible for real-time graphics due to the poor memory access patterns involved, as casting multiple rays per pixel, with multiple recursive steps ends up touching a lot of memory in your geometry data set, which just thrashes the cache on CPU and modern/future GPU hardware alike.

It's not good for a vector processor, but it's still pretty good for a many-core processor.

You still run into the same problems regardless of whether you're using a vector processor with no branch-prediction and no cache, to if you're using a bunch of in-order cores with cache coherency, or full-blown out-of-order cores with cache coherency. You end up pulling in a combinatorial explosion (ie. exponential number) of cache lines or memory accesses per recursive ray tied to the complexity of your scene.

People like to talk about how you can just throw more cores and distributed computing architectur

You still run into the same problems regardless of whether you're using a vector processor with no branch-prediction and no cache, to if you're using a bunch of in-order cores with cache coherency, or full-blown out-of-order cores with cache coherency. You end up pulling in a combinatorial explosion (ie. exponential number) of cache lines or memory accesses per recursive ray tied to the complexity of your scene.

GPUs have no cache.(Fermi has one, but it doesn't really work, so we might as well not count it)

Anyway, the point is that of course ray casting is better suited to that hardware, but a lot of raytracing applications, like in medical or semiconductor imaging, already benefit from GPUs greatly.And those things actually run in real (or interactive) time.

What do you think memory access coalescing is, and why do you think it needs to be on aligned boundaries? Exploiting spatial locality of reference is still a cache, even it only has a single line. Given the huge impact of non-aligned access within a warp it seems silly to pretend that a GPU is not a cached architecture.

Yes, you do lose out on reflectivity, so you have to handle reflections through other means. If you use deferred shading, you can handle it in a separate reflective lighting pass in combination with something like cube-mapping. Even though you do a second pass to handle it, it'll still end up being faster than had you tried to implement fully recursive ray-tracing.

Ray-casting would still have the benefit in providing better scalability than polygonal rasterization as there is a fixed cost per pixel, whereas

They rely too much on drivers, they should just attack the hardware and make their own renderer rather than relying on crappy OpenGL and DirectX...But of course, they don't really want to invest in R&D. Game development is more of a "do something dirty quickly and then throw it again" kind of thing.

That would be a tremendous step backwards. You can get away with doing that if you're programming for a console, in fact that's how it used to be done. The problem is that as soon as you've got any variation at all in the hardware you very quickly start to have to code for every individual unit that you're going to support.

Need multiple resolutions? Well, you're going to have to make sure you code for them rather than handing them off to a 3rd party library. Unit have extra RAM? Well, you're going to have t

I spent a very small amount of time playing around with hamlib for the GBA and that's how things were done on it. If you wanted to draw something on screen, there'd be a particular set of registers to write to, same for most other things that you might want to do. All in all it was a pretty nice set up.

The problem is that as soon as you've got any variation at all in the hardware you very quickly start to have to code for every individual unit that you're going to support.

If your software is of good quality, it is generic and easily retargetable.

Also, if you just consider GPUs, the interfaces to program them (CUDA in particular) has been there for quite a few generations and probably will stay for a long time still.And all NVIDIA (and probably also AMD) drivers use the same code, with relatively few low-l

That is the bad old days of computer graphics and sound - if a given game wasn't written for your particular hardware, too bad. It's hard to write to the hardware when there is a proliferation of distinct graphics cards out in the world and many more are added every year. On top of that, the way to talk to a given graphics card is often secret. There's a reason that people use OpenGL and DirectX.

But a game written for OpenGL or Direct3D in 2001 still runs on modern hardware. A game written to write directly to 2001 hardware does not.

This is irrelevant, since the industry does not try to make money out of old games. What they want is to make money at launch, then milk the cash cow for some time with a couple of people working on DLCs, then move on.

Also notice how your old Playstation games don't work without a Playstation (short of using an emulator). This is not a serious issue. Games are not meant

Cell was seriously over hyped as platform for generic computing - as was ps3, which is seriously short on memory. dreamcast would make a fine example of using shitty general sdk(the wince based one) vs. doing native, because on it

now, there's a difference between game engine developers and content-jockeys. content jockeys just create content on autodesk tools, totally dependent on old school artistic plot writing and art creation - thus, their games very rarely manage to amaze people on the virtual world si

They make more than 1 graphics card each. Btw Intel sells more graphics cards than anyone else. You've probably got one integrated on your computer without knowing about it. There are also many flavors of each graphics card that the big companies come out with.

If you listen to the interview, you'll hear John Carmack saying that built-in cards might out-perform dedicated cards for some things in future if they grant better memory access by virtue if using system memory directly. How do you know that the hardware interface between graphics cards never change? That doesn't sound right to me, but if you have an inside source, feel free to enlighten us.

Let me make a guess. You don't program do you?1. AAA games already cost a lot to make. You want to spend $200 for a game?2. Hardware is changing fast. If you write for the hardware what hardware do you write for? Which card? All of them? What about the cards that come out while you are spending the three years developing the game?

Now you may be confusing drivers with game engines but even then you would be wrong just not insane.

I write software tools for high-performance computing that work on a variety of hardware, including all variations of x86, POWER, PowerPC and Cell, ARM, GPUs, multi-core, clusters... and other more confidential architectures (many-core, VLIW, FPGA, ASICs...)Supporting a lot of different hardware is not an insurmountable problem if you have a good design (and a good test farm).

Now you may be confusing drivers with game engines but even then you would be wrong jus

I mean, the main criticism for id games has been that they are less games and more tech demos. Practically every game engine id ever sold has been used for much more interesting games by other people, but id still gets license fees.

If they don't invest in R&D, why are they doing their own engine design at all, and more importantly, just what are they investing in?

And even if you're right about OpenGL and DirectX being "crappy", which I highly doubt, the fact is that they are at least somewhat por

As I said in another subthread, it doesn't matter if you have choice since there are only two manufacturers anyway (with Nvidia being better supported by games in general).Games usually can't even run on Intel GMA even on the lowest settings.

And if you need to replace your graphics card every two years to run new games, it's no different than buying a Sound Blaster 16 or whatever other fancy hardware a game at the time required.

Except that for unlike what you are talking about that I don't need to always keep around the Sound Blaster 16 to play that old game. Written to a portable API means I can use that same game on an old Sound Blaster or my new integrated sound card. This is what you seem to be missing.

This is irrelevant, game developers don't care about whether you can play old games on new hardware. That's not their business model.Again, I've already addressed this in another subthread. Go read it for more details.

Sorry, it's not irrelevant despite what you continue to claim. Your idea leads to more code needing to be written, far more testing is needed and programs are far more likely to break and have issues than what is written now. Basically your idea is fucking stupid on pretty much all counts.

The idea is to ditch that model of "write once then throw away" and replace it with "write a good engine and reuse it throughout all your projects".That's what the big game development companies are moving towards. The ones that can't do their own or can't license that of a third party will have to fall back to casual games that don't require much processing power.

Directx is only 'cross platform' if you count different video cards and the xbox360 as different platforms.Opengl 'is' cross platform, you can run it on your pc, it's used on mac's, it's used in linux, it's used in most cellphones, it's used in most consoles(sans xbox and xbox360).

Directx is only 'cross platform' if you count different video cards and the xbox360 as different platforms.

I'm pretty sure that an Xbox 360 and a PC are different platforms. "Cross-platform" only means "runs on different platforms" it does not imply some minimum number of platforms. If I write something that works on both OS X and Windows it is "cross platform" even if it won't run on Linux or a cellphone.

I didn't say "corss-platform", you did. I said "somewhat portable across hardware," which is true -- you can develop a game for nVidia, ATI, and yes, even Intel, and if you stick to the API, you don't need code specific to any of those cards.

I'm going to agree, at least partially. I think we do rely too much on drivers to overcome the problem of a very diverse video card system.

This bugs me.
I see no compelling reason for the video manufacturers to postpone a more modern standard for video cards. Video cards have simple compatibility at a ~20 year old standard. It is really depressing to see that when I install an OS, the video is set to some horrible resolution. When I go to download the driver it's a mess, because the site is nearly unu

That's probably the most interesting thing he said all day. Throw an ARM core on the GPU, provide some sort of standard API for using it, and you eliminate all those pesky latency issues. Modern GPUs have enough RAM that one could potentially push the entire renderer onto the GPU, with the CPU and main memory only being responsible for game logic.

Of course, he also seems to be implying that this might go the other way, with integrated graphics winning out...

anyhow, graphics AND game logic(game physics) are very closely connected. when they're not, you end up with stuff like nwn-engine derived shit(where the game is like a game from 386 era, but with plastered on super fancy graphics, while the game still just runs basically in a 2d field, or when a game is full of let's say grass thats shaded, but while that grass has no meaning to gameplay or even ai's field of view).

Too late. My current video card has many times more RAM than my first computer. It also has two GPUs, which can perform certain tasks ridiculously faster than my CPU despite "only" operating at a few hundred megahertz. Tons of stuff can be trivially offloaded to the point where the CPU is just coordinating things, and we hardly even need such a low-latency connection, at least for non-game-related task.

ARM chips are so small, cool, cheap, such a small power draw that they're insignificant next to everything

That was done 20 years ago with TIGA graphics boards. There was a simple BSP tree demo (flysim) which downloaded the scenery database and renderer into the cards memory. All the host PC did was to convert keyboard events into camera motion commands.

The Euclideon infinite detail technology was exposed in a recent interview to be efficient sparse voxel octree ray-casting with contours (instead of axis-aligned cubes), which was previously investigated by nVidia researchers and others.

It's still not quite competitive enough with conventional rasterization techniques if you also try to do decent lighting with it along with all of the other stuff that goes into developing a game. Maybe in coup

Why after all this time do people still hang on to his every word? Its just some dude, not Jesus.

By all means, feel free to do something more noteworthy and become the new authority on the subject at hand. Until then, I think I'll continue feeding my Savior Reincarnate shrine its daily snackrifices.

He's really smart, hard-working, and open with his opinions. If something blows, he doesn't sugar coat it out of fear for his relationship with the company whose product he's bagging on.

He's also fiercely empirical. Moreso than a lot of his competitors, he doesn't buy into his own bullshit. If he has an idea, he makes an experiment to test it, and if it doesn't cut the mustard, he adjusts his view on the subject. He reverses his opinions over ti

Judging from the Rage trailers it is safe to say that JC has set the bar higher once again. It still looks like Quake with cars so I'm not much interested in actually playing it, but interestingly you can see some intrusions from the Bethesda culture in the form of ambient banter and quests. It seems like a no brainer that John has already turned to the truly open world problem so Bethesda can realistically use his next engine for a proper walk-around.

At the time, it probably was faster, which is good because Quake III ran on some really old hardware. Unfortunately on the CPUs of today it's actually slower, at least with the 20k or so operations I benched with.

It's always amazing how there is a faster way of doing calculations in using a few simple instructions rather than one complex instruction. Swapping registers using XOR instructions, using SUB to clear registers, and using integer arithmetic to manipulate floating point data were the ones I heard about.