Quasi-random, more or less unbiased blog about real-time photorealistic GPU rendering

Tuesday, October 22, 2013

Le Brigade nouveau est arrivé!

Time for an update on Brigade 3 and what we've been working on: until now, we have mostly shown scenes with limited materials, i.e. either perfectly diffuse or perfectly specular surfaces. The reason we didn't show any glossy (blurry) reflections so far, is because these generate a lot of extra noise and fireflies (overbright pixels) and because the glossy material from Brigade 2 was far from perfect. Over the past months, we have reworked the material system from Brigade and replaced it with the one from OctaneRender, which contains an extraordinary fast converging and high quality glossy material. The sky system was also replaced with a custom physical sky where sky and sun color vary with the sun position.And there's a bunch of brand new custom post effects, tone mapping filters and real camera effects like fish eye lens distortion (without the need for image warping).

We've had a lot of trouble finding a good way to present the face melting awesomeness that is Brigade 3 in video form and we've tried both youtube and Vimeo at different upload resolutions and samplecounts (samples per pixel). Suffice to say that both sites have ultra shitty video compression, turning all our videos in a blocky mess (although Vimeo is still much better than YT). We also decided to go nuts on glossy materials and fresnel on every surface in this scene, which makes everything look much a lot more realistic (in particular fresnel, which causes surfaces to look more or less reflective depending on the viewing angle), but the downside of this extra realism is a lot of extra noise.

So feast your eyes on the first videos of Brigade 3 (1280x720 render resolution):

The scene in the video is the very reason why I started this blog five years ago and is depicted in one of my very first blog posts from 2008 (see http://raytracey.blogspot.co.nz/2008/08/ruby-demo.html). The scene was created by Big Lazy Robot to be used in a real-time tech demo for ATI's Radeon HD 4870 GPU. Back then, the scene used baked lightmaps rendered with V-Ray for the diffuse lighting and an approximate real-time ray tracing technique for all reflective surfaces like cars and building windows. Today, more than five years later, we can render the same scene noise free using brute force path tracing on the GPU in less than half a second and we can navigate through the entire scene at 30 fps with a bit of noise (mostly apparent in shadowy areas). When I started this blog my dream was to be able to render that specific scene fully in real-time in photoreal quality and I'm really glad I've come very close to that goal.

UPDATE: Screenshot bonanza! No less than 32 screenshots, each of them rendered for 0.5 - 1 second. The problem with Brigade 3 is that it's so much fun mucking around with the lighting, the time of day, depth of field and field of view with lens distortion. Moreover, everything looks so photoreal that it's extremely hard to stop playing and taking screenshots. It feels like you're holding a camcorder.

We plan to show more videos of Brigade 3 soon, so stay tuned...

Update: I've uploaded the direct feed version of the second video to MEGA (a New Zealand based cloud storage service, completely anonymous, fast, no registration required and free, just excellent :). You can grab the file here: brigade3_purely_random_osumness (it's 2.40 GB)

Update 2: The direct feed version of the first video can be downloade here: brigade3_launch_vid_HD.avi (2.90 GB). This video has a higher samplecount per pixel per frame (and thus less noise and lower framerate).

Alex: I would say GPUs need hardware to improve ray coherency because thread divergence (where some threads in a warp take much longer than the others, effectively stalling the whole warp which sinks the efficiency) is a huge problem (maybe some ray sorting is possible in HW), to become more MIMD like, and to speed up ray traversal (intersections with the acceleration structure). Ray/triangle intersections (or ray/voxel if you want to do volumetric smoke) could be accelerated as well although. GPUs are already very fast at shading computations provided that ray coherency is preserved.Of course, HW that can help building the acceleration structure on the GPU itself would be cool too. The Maxwell GPU should have a minimum of 4 ARM CPU cores on the same die as the GPU, which might alleviate some of these problems. Especially animated scenes should benefit from these architectural novelties. But it will also be possible to do much more funky stuff than just pure path tracing. I don't know much about the R9 290x, but based on pure compute throughput numbers I'm pretty sure it should be a lot faster than the Titan for path tracing.

OMG yes! Octane Material System: Hell yes! You did an Vimeo Upload which I eagerly requested: Thank you sooooo much! I am very very pleased with what I am seeing here. It is so great to see that your team is so passionate on working on the engine. Since OTOY is in a partnership with Nvidia: Do these guys know about brigade? If not: go tell them! If they do: Go ahead and ask them if they really could develop some hardware for that. I mean one would need to be blind to not see that this is the future of gaming.

Have you tried running each pixel for a fixed number of iterations, and if not finished adding it to a queue. Then in a second pass run through the queued pixels again, but in a group. This should help warp code divergence (but not data divergence obviously).

whouuuuuuuuuuuuuuuuuuu!!!!that looks..... turning the camera around one of those cars reminded me very much of some car ad TV spot. Apart from the rather low quality assests, like the houses and the street you can clearly see the huge power Brigade 3 has to offer. Nice! Cant wait to see moreof this awesome stuff.

colocolo: thanks, the scene is actually the highest quality scene we have at the moment, all the cars and even the ones in the background consist of hundreds of thousands of triangles, which is overkill, but luckily it doesn't affect the performance too much. We want to bring our own assets to the scene

Anonymous: of course, we're continuously improving the efficiency, there's something very useful coming up for nightscenes

anonymous/Onq: I don't know enough about AMD's new architecture, looks nice on paper, but so did the first gen Kepler and that turned out to be a real disappointment for path tracing. So until we've got the hardware in our hands, we can't say much about it.

Anonymous: re Intel's Knight's Corner/Ferry/Xeon Phi: it's about half as fast as the GeForce Titan in path tracing and about 4 times as expensive, so not very interesting

One concern of mine is multi-light rendering. If I put tens of lights acting on a single object, that might increase the noise from an order of magnitude.I hope you have some solution on this one, the number of lights is also correlated to the path tracing promises.

Anyway, everyone who heartily roots for a lot better 3D graphics and is sane (unlike others) should be supporting the following idea how exactly to build the most powerful video game console ever right now, you bet:

colocolo: a forest seen from above should be doable, but having the camera inside a densely packed forest will be the real challenge, because most of the light will be indirect. I just need to find a good forest scene to test.

Sam did you test already performance on AMD cards? Any big difference like in OpenCL LuxRender(over 2 times better than nvidia) Brigade is so exciting! Very interesting to see high poly-quality scenes, forest and dynamic water. By the way did you know about fluid v.3? Real-time opensource fluid simulations(on a good hardware).

Sam, half a second for one image.Does that mean that it has to become 10x faster?(50ms,20frames) does it behave that linear?Noise is already low...but screenshots look a lot better. ;)for me a playable threshold isnt reached(shadow reagion), but anyway awesome stuff! if one day it looks like on the screenshots, man...artists will make a hell of a Matrix with that. :)

Sam,Could you do me a favour and find out what happened to cinema 2 engine shown in the ruby demo. Would love to find out more about it's raterisation/raytrace engine design. Even better as it's so old and obviously nothings beeing done with it open source to the community (or for a small fee say $100) for indie devs to play with. Also ive been building assets for a futuristic pod racer like style racer for over 8 months (and building my own opengl engine), I f i nocked up a complete level with a few racer designs would love to see you run it through brigade 3. Cheers. James

Hi Sam,You're getting close for sure. Porting Octane material system to Brigade is great but what are we porting to Octane?I.e. what does Octane still offer us that Brigade doesn't? (Except a gazillion plugins of course...) Will Octane see these speeds?Will the two apps converge?Any thoughts on this?Seekerfinder

@anonymous ha i think for desktop gaming/oculusits a longer way. But until then there are still some next gen games that will look awesome. and if you have the budget you can play them in 4k if Oculus releases 2016 a new version. (they are indeed already planning for that, so Iribe)Nevertheless Brigade is something complete different i think. When it will go into market then with a big explosion.Graphics card will have 32GB memory, memristors.....hybrid memory cube...500GB blu rays....The detail and quality fidelity will be humongous i think. No need to go to cinema anymore...

Skif: yes, it runs more than fine on AMD OpenCL, Clay needs a break, he's tired of all the dancing and so are we

Anonymous: no idea, but I'm pretty sure we haven't exhausted everything algoritmically speaking yet. Brigade is getting a few percent faster almost every day, and not seldomly there's even a much larger jump in performance. That's the cool thing about GPU programming, it's such a brand new and uncharted territory that a small tweak can cause an enormous speed boost. I think there's still a huge amount of untapped potential even in the current gen of GPUs.

colocolo: yes, we're currently only a factor of 10x away from game quality noisefree images in real-time. That means that if we don't do any further algorithmic optimizations, GPU's will have the power to run this at high image quality in 720p in 5 years. But if you take into account that there will be substantial algorithmic and hardware improvements, I think it will be closer to 1.5-2 years from now (for 1080p/30fps).

Fady: I don't know, it would make sense as cloud-only engine initially

James: I don't know what happened to the Cinema 2.0 demo, but actually I don't care since we can do all the lighting and animation in that scene in real-time now (it was prebaked in the original demo). That's what matters to me

Seekerfinder: Octane has tons of production grade features which offline 3d artists can't live without, but which Brigade doesn't need for its purpose. It's that relative simplicity that makes Brigade faster than Octane.

Anonymous: non von Neumann is the key

Mark: no demo yet unfortunately

Andre: I would loveto test a forest/mountain scene, but I need to find a good one.

Anonymous/axyz design/cg river: those models look good, thanks. Next time you should render the promo video with Octane instead of Keyshot, you'll get an instantaneous render with HDRI :)

Anonymous: it is indeed fuck awesome. It's nuts if you think that these images used to take hours just a few years back. I started doing ray tracing in 2008 and I remember it took me about 3 hours to render a glossy Android model on the CPU. Today I can do that same render at higher quality in less than a second with Brigade or Octane on two Titan GPUs. That's more than 10,000 times faster in just 5 years. It's absolutely mind boggling if you realize this.

10,000 times faster. funny, yesterday i saw an interview of an intel guy speaking about the new xeon phi. he worked on a supercomputer 1997 with 10,000 processors and it could execute 1TFLOP. then he showed the xeon phi processor with 50 cores that can do also 1TFLOP and laughed. then i thought, man a supercomputer in 1997 was still too slow for some pure ray tracing graphics. fortunately it arrives at all. i thought we would never have this graphics on PCs.Thanks Sam and the other guys at OTOY for telling me the Matrix is possible. :)

No problem colocolo :) IMO, the Xeon Phi card feels like it's too little too late. The latest GPUs from both Nvidia and AMD are already much faster at path tracing and the gap is only going to get bigger.

Waiting for links =3 Also maybe interesting to see prerendered demo with brigade to see what exactly we can expect in a future from yours passionate creativity! Anyway big thanks Sam for your entusiasm and very significant progress! Glory!

Skif, thanks a lot and the video download link is up btw. RE: prerendered animation, Brigade isn't really made for offline rendering, it would be easy to add such functionality but it would defeat the purpose of real-time path tracing. To give you an idea of the quality we can have with Brigade with instantly noisefree images, check out this CG animation:

https://vimeo.com/37517970

Right now, Brigade can do the exact same thing as what you see in the animation sans the smoke and motion blur. That animation actually inspired me to try the NYC scene you see in this post (and which was featured in my first blogpost in 2008) in Brigade and I must say that the results greatly exceeded my expectations.

The shrinkage problem isn't really all that serious. Intel at least has been working with theoretical models to shrink all the way down to single-digit nanometer feature size using modifications of current tech. And there are also entirely new ways to do it that are in R&D phase; entire departures from photo-lithography.

We also have 3D chips to look forward to. Successful implementation will be more significant than the boost Moore's Law gives us. We could be scaling much more than just 2x every 18 months.

It's a problem worth thinking about, because the problem IS real, but that doesn't mean solutions don't or won't exist. Computers are not hitting a brick wall. We have many more performance increases to look forward to for the foreseeable future.

High-performance computing is very important to a lot of people. The scientific and engineering communities depend on it, and that's the prime reason a lot of supercomputers even exist. That's an unfathomably big industry. Everyone from professors and grad students working with theoretical models, to big agencies like NASA and pharmaceuticals need to run compute-heavy models. Then you've got agencies that just deal with a lot of computation in general, like the NSA and private firms.

High-performance computing is in higher demand now then ever. We're not gonna see progress suddenly stop, because too much of the world depends on it progressing as quickly as it can.

@Anthony Eadiciccoyeah i know, was just kidding...but i have heard also that some nations build supercomputers only for the prestige of their country...yet i dont know what that means for the average utilization of those SPCs....

Anonymous: that filtering technique is too slow unfortunately. It's often more advantageous too spend more time on rendering extra samples instead of filtering pixels with not enough samples. And even though the amount of noise (variance) gets cut in half with the square of the number of samples, the perceptual difference in noise between 4 and 16 spp is much larger than the perceptual difference between 16 and 64 spp.

colocolo, Anthony: I'm not too worried about hitting a wall in the performance increase of 3d chips, but I do believe that some of the stages in the ray tracing pipeline can be massively accelerated by dedcated hardware. Now is the perfect time for Nvidia and AMD to look into that, because the next gen consoles will stagnate the advancements in game graphics for another 5-6 years until the cloud takes over completely (I'm fairly sure there will be never be a PS5 or Xbox One.5)

Thanks Sam for screens and vids, but i'm sure, obviously, this scene can't show even 10% of brigade potential. All you show before is static, are brigade ready for dynamic environment like moving trees, fireballs, water etc multilight; and in general this is question more of rude compute power or brigade flexibility? Ps what are you preparing for us next? =p White dots on 6th screen dof or rendering error?

Infinite-Realities' scanned models combined with Brigade would be completely amazing. If you haven't checked them out, you should. They're some of the most realistic real-time 3d models I've ever seen. There's a downloadable demo called HydraDeck-Humans that you can check out if you've got an Oculus Rift. Inside the Rift they're so realistic that it's almost creepy that they're not moving. It feels like they're real dead people. I think Brigade combined with those models, realisticly animated, would be the holy grail of gaming. Here's a quick video of the demo.

Skif, re moving trees, it's possible with instancing, but all the trees would move in lockstep unless you can find a way to offset/randomize the swaying tree animation for each tree. Regarding multiple lights, Brigade 3 has a specific optimization to deal efficiently with hundreds of lights of varying size and orientation. The next demo will have something to do with the Lamborghini model. The white pixels you see in screen 6 are fireflies, they're more common in out of focus areas.

Michael, yep I've seen those, they look great, looking forward to see them animated

Hey Sam, Kingbadger3d here. Just wanted to share a few thoughts about the new AMD GCN cards. You probably heard of this new true audio (or something like that can't quite remember the name). Lots of people have mistakenly thought this was a sound card replacement, IT@S NOT. The new cards now include a VERY powerfull programable DSP on die, the dsp is being used for this new level of sound processing but after looking into it further being a fully programable DSP it can be used for anything the user likes if you write your own code. Im looking at ways this can be used for even faster BVH builds etc etc, Thinking even a realtime low latency noise reduction algorithm is more than feasable. You and your boys should look into this. Let me know what you think Bruda. Cheers J

I LOL'ed when the robot started boogieing, reminded me of the Citroen C4 commercial with the dancing Transformer (https://www.youtube.com/watch?v=bRArw9l3hFw). Hard to believe this can be done in real-time.

IT staff outsourcing services give backup support along with other IT services at any time of company's requirement. Apart from IT services you can rely on out sourcing services for necessary software equipments also.

About Me

Passionate about real-time path tracing and photoreal rendering with GPU ray tracing. I'm currently leading the scientific visualisation team at the EPFL Blue Brain Project in Geneva. Before that, co-founder and project lead at MI New Zealand, project lead at the University of Auckland NZ, technical project manager on OctaneRender (from pre-v1.0 beta to v2.0), instigator and driving force behind the Brigade real-time path tracing in games project leading the creative and technical R&D vision (Feb 2012 - Oct 2013), photoreal 3D graphics developer and consultant, medical imaging/neuroradiology researcher. My tutorial series on GPU accelerated path tracing (with source code) can be found on GitHub.
For questions, email me at sam.lapere@live.be