Timothy Lottes, the creator of FXAA, dishes on PS4's potential

PS4
Working assuming the Eurogamer Article is mostly correct with the exception of maybe exact clocks, amount of memory, and number of enabled cores (all of which could easily change to adapt to yields).

While the last console generation is around 16x behind in performance from the current high-end single chip GPUs, this was a result of much easier process scaling and this was before reaching the power wall. Things might be much different this round, a fast console might be able to keep up much longer as scaling slows down. If Sony decided to bump up the PS4 GPU, that was a great move, and will help the platform live for a long time. If PS4 is around 2 Tflop/s, this is roughly half what a single GPU high-end PC has right now, which is probably a lot better than what most PC users have. If desktop goes to 4K displays this requires 4x the perf over 1080p, so if console maintains a 1080p target, perf/pixel might still remain good for consoles even as PC continues to scale.

The real reason to get excited about a PS4 is what Sony as a company does with the OS and system libraries as a platform, and what this enables 1st party studios to do, when they make PS4-only games. If PS4 has a real-time OS, with a libGCM style low level access to the GPU, then the PS4 1st party games will be years ahead of the PC simply because it opens up what is possible on the GPU. Note this won't happen right away on launch, but once developers tool up for the platform, this will be the case. As a PC guy who knows hardware to the metal, I spend most of my days in frustration knowing damn well what I could do with the hardware, but what I cannot do because Microsoft and IHVs wont provide low-level GPU access in PC APIs. One simple example, drawcalls on PC have easily 10x to 100x the overhead of a console with a libGCM style API.

Assuming a 7970M in the PS4, AMD has already released the hardware ISA docs to the public, so it is relatively easy to know what developers might have access to do on a PS4. Lets start with the basics known from PC. AMD's existing profiling tools support true async timer queries (where the timer results are written to a buffer on the GPU, then async read on the CPU). This enables the consistent profiling game developers require when optimizing code. AMD also provides tools for developers to view the output GPU assembly for compiled shaders, another must for console development. Now lets dive into what isn't provided on PC but what can be found in AMD's GCN ISA docs,

Dual Asynchronous Compute Engines (ACE) :: Specifically "parallel operation with graphics and fast switching between task submissions" and "support of OCL 1.2 device partitioning". Sounds like at a minimum a developer can statically partition the device such that graphics can compute can run in parallel. For a PC, static partition would be horrible because of the different GPU configurations to support, but for a dedicated console, this is all you need. This opens up a much easier way to hide small compute jobs in a sea of GPU filling graphics work like post processing or shading. The way I do this on PC now is to abuse vertex shaders for full screen passes (the first triangle is full screen, and the rest are degenerates, use an uber-shader for the vertex shading looking at gl_VertexID and branching into "compute" work, being careful to space out the jobs by the SIMD width to avoid stalling the first triangle, or loading up one SIMD unit on the machine, ... like I said, complicated). In any case, this Dual ACE system likely makes it practical to port over a large amount of the Killzone SPU jobs to the GPU even if they don't completely fill the GPU (which would be a problem without complex uber-kernels on something like CUDA on the PC).

Dual High Performance DMA Engines :: Developers would get access to do async CPU->GPU or GPU->CPU memory transfers without stalling the graphics pipeline, and specifically ability to control semaphores in the push buffer(s) to insure no stalls and low latency scheduling. This is something the PC APIs get horribly wrong, as all memory copies are implicit without really giving control to the developer. This translates to much better resource streaming on a console.

Support for upto 6 Audio Streams :: HDMI supports audio, so the GPU actually outputs audio, but no PC driver gives you access. The GPU shader is in fact the ideal tool for audio processing, but on the PC you need to deal with the GPU->CPU latency wall (which can be worked around with pinned memory), but to add insult to injury the PC driver simply just copies that data back to the GPU for output adding more latency. In theory on something like a PS4 one could just mix audio on the GPU directly into the buffer being sent out on HDMI.

Global Data Store :: AMD has no way of exposing this in DX, and in OpenGL they only expose this in the ultra-limited form of counters which can only increment or decrement by one. The chip has 64KB of this memory, effectively with the same access as shared memory (atomics and everything) and lower latency than global atomics. This GDS unit can be used for all sorts of things, like workgroup to workgroup communication, global locks, or like doing an append or consume to an array of arrays where each thread can choose a different array, etc. To the metal access to GDS removes the overhead associated with managing huge data sets on the GPU. It is much easier to build GPU based hierarchical occlusion culling and scene management with access to these kind of low level features.

Re-used GPU State :: On a console with low level hardware access (like the PS3) one can pre-build and re-use command buffer chunks. On a modern GPU, one could even write or modify pre-built command buffer chunks from a shader. This removes the cost associated with drawing, pushing up the number of unique objects which can be drawn with different materials.

FP_DENORM Control Bit :: On the console one can turn off both DX's and GL's forced flush-to-denorm mode for 32-bit floating point in graphics. This enables easier ways to optimize shaders because integer limited shaders can use floating point pipes using denormals.

128-bit to 256-bit Resource Descriptors :: With GCN all that is needed to define a buffer's GPU state is to set 4 scalar registers to a resource descriptor, similar with texture (up to 8 scalar registers, plus another 4 for sampler). The scalar ALU on GCN supports block fetch of up to 16 scalars with a single instruction from either memory or from a buffer. It looks to be trivially easy on GCN to do bind-less buffers or textures for shader load/stores. Note this scalar unit has it's own data cache also. Changing textures or surfaces from inside the pixel shader looks to be easily possible. Note shaders still index resources using an instruction immediate, but the descriptor referenced by this immediate can be changed. This could help remove the traditional draw call based material limit.

S_SLEEP, S_SETPRIO, and GDS :: These provide all the tools necessary to do lock and lock-free retry loops on the GPU efficiently. DX11 specifically does not allow locks due to fear that some developer might TDR the system. With low level access, the S_SLEEP enables placing wavefront to sleep without busy spinning on the ALUs, and the S_SETPRIO enables reducing priority when checking for unlock between S_SLEEPs.

S_SENDMSG :: This enables a shader to force a CPU interrupt. In theory this can be used to signal to a real-time OS completion of some GPU operation to start up some CPU based tasks without needed the CPU to poll for completion. The other option would be maybe a interrupt signaled from a push buffer, but this wouldn't be able to signal from some intermediate point during a shader's execution. This on PS4 might enable tighter GPU and CPU task dependencies in a frame (or maybe even in a shader), compared to the latency wall which exists on non-real-time OS like Windows which usually forces CPU and GPU task dependencies to be a few frames apart.

Full Cache Flush Control :: DX has only implicit driver controlled cache flushes, it needs to be conservative, track all dependencies (high overhead), then assume conflict and always flush caches. On a console, the developer can easily skip cache flushes when they are not needed, leading to more parallel jobs and higher performance (overlap execution of things which on DX would be separated by a wait for machine to go idle).

GPU Assembly :: Maybe? I don't know if GCN has some hidden very complex rules for code generation and compiler scheduling. The ISA docs seem trivial to manage (manual insertion of barriers for texture fetch, etc). If Sony opens up GPU assembly, unlike the PS3, developers might easily crank out 30% extra from hand tuning shaders. The alternative is iterating on Cg, which is possible with real-time profiling tools. My experience on PC is micro-optimization of shaders yields some massive wins. For those like myself who love assembly of any arch, a fixed hardware spec is a dream.

...

I could continue here, but I'm not, by now you get the picture, launch titles will likely be DX11 ports, so perhaps not much better than what could be done on PC. However if Sony provides the real-time OS with libGCM v2 for GCN, one or two years out, 1st party devs and Sony's internal teams like the ICE team, will have had long enough to build up tech to really leverage the platform.

I'm excited for what this platform will provide for PS4-only 1st party titles and developers who still have the balls to do a non-portable game this next round.

I thought the developers did some really unexpected things with going to the metal on RSX with the PS3. With these Shader Modeler 5 architected GPUs and perhaps an APU or Co-processor in the mix, there's a lot that developers can make happen with these parts that you just don't see in PC games using the same exact parts. Couple this with developers like Naughty Dog and, it's really hard not to get hyped up about that. Hence, my Golden Oozaru avatar, currently.

I think that is what lot of people are missing when they keep talking about the specs, even though the specs are slower than the PC counterparts since it is using hardware that is set in stone the developers can tweak and do stuff that you just won't see on a PC! Great info, nice find Lefein!

Providing the specs are reasonably accurate so far (that we've been given) this guy takes a huge steaming dump on the Xbox 720\Durango. Which he kind of should, because the specs on it so far are trash.

Providing the specs are reasonably accurate so far (that we've been given) this guy takes a huge steaming dump on the Xbox 720\Durango. Which he kind of should, because the specs on it so far are trash.

He is $#@!.
He loves his Linux, OpenGL basically anti MS.

Everything, absolutely every $#@!ing thing he said, can be applied to any GCN based GPU (ie. the rumoured 720 console).

He also talks a load of $#@! about some things that he just doesn't even understand.
For a supposed GPU guru, the fact that he doesn't even inspect his lowlevel GPU assembly (its piss easy to do), makes me wonder what the hell he is doing.

32MB of ESRAM is only really enough to do forward shading with MSAA using only 32-bits/pixel color with 2xMSAA at 1080p or 4xMSAA at 720p. Anything else to ESRAM would require tiling and resolves like on the Xbox360 (which would likely be a DMA copy on 720) or attempting to use the slow DDR3 as a render target.

Bull$#@!, for example.

You can fit 4 full 32 bit buffers in the ESRAM at 32mb

Only and idiot would try and do the MSAA until after the image is assembled, which always requires 2 passes in defferend rendering.

TXAA has been around since 1983.
Both are post processing effects, in which you act on 2d textures, at most its HLSL based on the pixel shader (clearly).
It doesn't mean he knows much about how to setup and drive a 3d engine.

Originally Posted by Vulgotha

Why don't you have your own blog man? I'd love to read your take on this stuff on a regular basis.

I have not had the time in the past 4 years to do much, that's not to say I haven't been creating the odd experimental pixel shader in my head.

Academics aren't always the best judge of real world implementations.
But as I say his comments regarding a gcn based GPU in a dedicated console, are certainly not specific to any one particular console, considering we don't know whats in either.

Everything, absolutely every $#@!ing thing he said, can be applied to any GCN based GPU (ie. the rumoured 720 console).

He also talks a load of $#@! about some things that he just doesn't even understand.
For a supposed GPU guru, the fact that he doesn't even inspect his lowlevel GPU assembly (its piss easy to do), makes me wonder what the hell he is doing.

Bull$#@!, for example.

You can fit 4 full 32 bit buffers in the ESRAM at 32mb

Only and idiot would try and do the MSAA until after the image is assembled, which always requires 2 passes in defferend rendering.

Lmao at this article its garbage no way the ps 4 is better thaun a dedicated high performance gaming pc.

let me begin by saying there is no such thing as a 'dedicated gaming PC'. PC by definition is a general purpose computing device and this in turn makes it inefficient at these specialized tasks

Secondly, it is very much possible for a true dedicated gaming device (i.e a game console) to be better than a PC at gaming. PCs are highly wasteful with resources and just take the lazy route of brute forcing through problems. Console are like bruce lee (all about technique)

Posting Permissions

PlayStation Universe

Copyright 2006-2014 7578768 Canada Inc. All Right Reserved.

Reproduction in whole or in part in any form or medium without express written
permission of Abstract Holdings International Ltd. prohibited.Use of this site is governed
by our Terms of Use and Privacy Policy.