Radeon R9 295X2 CrossFire at 4K - Quad Hawaii GPU Powerhouse

Test Setup

Setting up the pair of Radeon R9 295X2 cards in software was pretty painless. AMD sent over a Catalyst 14.4 beta driver to aid us. Once the driver was installed we enabled CrossFire and were ready to go. The driver install did take a damn long time likely because it was installing the driver four separate times, though.

Click to Enlarge

I was going to attempt to monitor temperatures and clock speeds on all four GPUs at the same time during testing with four different instances of GPU-Z running, but having more than one instance of GPU-Z going simultaneously apparently causes some stuttering and stability issues, so I was limited to just one. I did plenty of spot checks though and found that even after some extended periods of gaming the GPUs were still running at 1018 MHz and were staying in the safe zone of under 80C.

Testing Setup

For this article we are only going to be comparing a single Radeon R9 295X2 to the pair of Radeon R9 295X2 cards in CrossFire – no other GPU configuration we have here really stands up to it so there was no point in complicating things. As far as I know, NVIDIA has not launched the Titan Z card, and without access to a pair of those we don’t really have a configuration that can match these AMD options.

All of our testing was done at 4K as well (3840x2160) with other resolutions left out of the comparison. Why? Because if you are spending $3000 on graphics cards you might as well be spending $700-2700 on a matching (awesome) 4K display. Seriously, if you are going to game at 2560x1440 then save a ton of money and just get a single card!

While there are literally dozens of file created for each “run” of benchmarks, there are several resulting graphs that FCAT produces, as well as several more that we are generating with additional code of our own.

While the graphs above are produced by the default version of the scripts from NVIDIA, I have modified and added to them in a few ways to produce additional data for our readers. The first file shows a sub-set of the data from the RUN file above, the average frame rate over time as defined by FRAPS, though we are combining all of the GPUs we are comparing into a single graph. This will basically emulate the data we have been showing you for the past several years.

The PCPER Observed FPS File

This graph takes a different subset of data points and plots them similarly to the FRAPS file above, but this time we are look at the “observed” average frame rates, shown previously as the blue bars in the RUN file above. This takes out the dropped and runts frames, giving you the performance metrics that actually matter – how many frames are being shown to the gamer to improve the animation sequences.

As you’ll see in our full results on the coming pages, seeing a big difference between the FRAPS FPS graphic and the Observed FPS will indicate cases where it is likely the gamer is not getting the full benefit of the hardware investment in their PC.

The PLOT File

The primary file that is generated from the extracted data is a plot of calculated frame times including runts. The numbers here represent the amount of time that frames appear on the screen for the user, a “thinner” line across the time span represents frame times that are consistent and thus should produce the smoothest animation to the gamer. A “wider” line or one with a lot of peaks and valleys indicates a lot more variance and is likely caused by a lot of runts being displayed.

The RUN File

While the two graphs above show combined results for a set of cards being compared, the RUN file will show you the results from a single card on that particular result. It is in this graph that you can see interesting data about runts, drops, average frame rate and the actual frame rate of your gaming experience.

For tests that show no runts or drops, the data is pretty clean. This is the standard frame rate per second over a span of time graph that has become the standard for performance evaluation on graphics cards.

A test that does have runts and drops will look much different. The black bar labeled FRAPS indicates the average frame rate over time that traditional testing would show if you counted the drops and runts in the equation – as FRAPS FPS measurement does. Any area in red is a dropped frame – the wider the amount of red you see, the more colored bars from our overlay were missing in the captured video file, indicating the gamer never saw those frames in any form.

The wide yellow area is the representation of runts, the thin bands of color in our captured video, that we have determined do not add to the animation of the image on the screen. The larger the area of yellow the more often those runts are appearing.

Finally, the blue line is the measured FPS over each second after removing the runts and drops. We are going to be calling this metric the “observed frame rate” as it measures the actual speed of the animation that the gamer experiences.

The PERcentile File

Scott introduced the idea of frame time percentiles months ago but now that we have some different data using direct capture as opposed to FRAPS, the results might be even more telling. In this case, FCAT is showing percentiles not by frame time but instead by instantaneous FPS. This will tell you the minimum frame rate that will appear on the screen at any given percent of time during our benchmark run. The 50th percentile should be very close to the average total frame rate of the benchmark but as we creep closer to the 100% we see how the frame rate will be affected.

The closer this line is to being perfectly flat the better as that would mean we are running at a constant frame rate the entire time. A steep decline on the right hand side tells us that frame times are varying more and more frequently and might indicate potential stutter in the animation.

The PCPER Frame Time Variance File

Of all the data we are presenting, this is probably the one that needs the most discussion. In an attempt to create a new metric for gaming and graphics performance, I wanted to try to find a way to define stutter based on the data sets we had collected. As I mentioned earlier, we can define a single stutter as a variance level between t_game and t_display. This variance can be introduced in t_game, t_display, or on both levels. Since we can currently only reliably test the t_display rate, how can we create a definition of stutter that makes sense and that can be applied across multiple games and platforms?

We define a single frame variance as the difference between the current frame time and the previous frame time – how consistent the two frames presented to the gamer. However, as I found in my testing plotting the value of this frame variance is nearly a perfect match to the data presented by the minimum FPS (PER) file created by FCAT. To be more specific, stutter is only perceived when there is a break from the previous animation frame rates.

Our current running theory for a stutter evaluation is this: find the current frame time variance by comparing the current frame time to the running average of the frame times of the previous 20 frames. Then, by sorting these frame times and plotting them in a percentile form we can get an interesting look at potential stutter. Comparing the frame times to a running average rather than just to the previous frame should prevent potential problems from legitimate performance peaks or valleys found when moving from a highly compute intensive scene to a lower one.

While we are still trying to figure out if this is the best way to visualize stutter in a game, we have seen enough evidence in our game play testing and by comparing the above graphic to other data generated through our Frame rating system to be reasonably confident in our assertions. So much in fact that I am going to going this data the PCPER ISU, which beer fans will appreciate the acronym of International Stutter Units.

To compare these results you want to see a line that is as close the 0ms mark as possible indicating very little frame rate variance when compared to a running average of previous frames. There will be some inevitable incline as we reach the 90+ percentile but that is expected with any game play sequence that varies from scene to scene. What we do not want to see is a sharper line up that would indicate higher frame variance (ISU) and could be an indication that the game sees microstuttering and hitching problems.

at 28nm indeed is not useful if you are speechless...what did you expect from 2x dual GPUs like these? I don't think that's a "normal" setup...
then wait for 20/16nm if you think about power saving for a quadcrossfire rig... ;-)

Smaller components mean less power usage, because smaller just uses less power. It is also an important part of the architect of the chip itself. Smaller components mean more components they can place on one chip. Which means more compute units in each chips.

Games typically use a technique called AFR to increase performance. In a two way GPU system, GPU A will render the even frames and GPU B will render the odd number ones. The ideal result is an ABABABABABABABABABAB patters of what GPU renders what frame. Scaling to three GPUs produces an ABCABCABC pattern and four goes to ABCDABCDABCD.

There are a couple of problems with this technique and GPU scaling. First is that there needs to be enough CPU power to provide the frame rate increase. Nowadays games are typically GPU limited but a CPU limitation could crop up in 3 way and 4 way configurations. Secondly, there is an API limitation to how many frames at one time can be processed. DirectX supports a maximum number of 6 concurrently frames being processed simultaneously. Thirdly Windows 7 has a limit of 8 GPU's in a system (I suspect Windows 8.x has the same limitation but haven't personally checked). Most distributions of Linux have a similar 8 GPU limit as Windows but there are kernel patches that'll enable more in a system. Fourth is that systems that use BIOS to boot will have issues with 8 or more cards due to legacy 32 bit memory allocations for GPU's. 64 bit EFI does not have this issue.

nVidia previously used a technique called split frame rendering where the top half of a frame is rendered by one GPU with the bottom being taken care of by another. This solves some of the issues outlined above. CPU load doesn't necessarily have to increase linearly but there is a bit of overhead in the drivers to perform load this load balancing. The DirectX limitation is also by passed directly so the real limitation becomes how many GPUs a single system can boot with. Since the number of frames being worked on is the same as a single GPU system, there is no microstuttering like you could encounter with AFR. SFR has several of it own issues though. nVidia has hidden away SFR support in their drivers so some developers tools are necessary to even enable it. This is for good reason as it is buggy and in some cases doesn't work at all. Last I checked, SFR didn't scale as well in 2 way GPU scenarios. I have not seen any modern tests using SFR and 4 way GPU's but really old benchmarks had 4 way AFR and 4 way SFR relatively similar in terms of scaling (about 3x performnace as a single card ideally). AFR and SFR can also be combined. AMD doesn't have a direct equivalent to SFR. I do recall some talk of a tile based solution where each GPU would render a checkerboard pattern but I believe nothing came of this.

I guess they must be actually using AFR for 4 gpu rendering. This seems like it would cause some artifacts or stuttering, attempting to render 4 temporally distinct frames simultaneously. Does this essentially induce 4 extra frames of latency? If so, is this enough to notice? Would the 6 frame DirectX limitation actually cause a problem? I could imagine needing to start set-up for another 4 frames while previous 4 frames are still processing (8 active frames).

Tiling the rendering load could scale to 4 or more gpus better, but the load balancing is not simple. Most images encountered in games can not be simplistically divided, since the load would be significantly different. The top tiles might only be sky, lower tiles might be low res scenery texture/geometry, while one tile may get high-res character texture/geometry. It may be simpler to just use stripes (NVidia SLI ?) rather than attempting to split into arbitrary tiles since you need something which works for 3 gpus and 4 gpus.

It is somewhat amazing that we are seeing good scaling in some of these games already. It would be interesting to know what is being done differently between those that scale and those that do not. Are some of them specifically optimized for up to 4 GPUs in the render engine?

Only really having access to 4 GB seems to be causing some performance limitations (see HardOCP testing). It would be nice to see the GPUs able to share memory rather than completely independent memory systems, but this will not be available for a while yet. Nvidia seems to be working on this with their NVlink technology; I don't know what AMD is doing. Sharing GPU memory requires really high bandwidth interconnect, but it is doable. If one used 4 of AMDs 32-bit HT links, you could get over 100 GB/s and this is not the latest tech. For chips really close together, the speed could probably be increased. You would probably reduce the width of the memory bus on each chip, and replace it with interconnect to neighboring chips. It could use 2 GB attached to each GPU. You may be able to put quad-gpus on a single card this way, but the power consumption/heat output would be limiting.

Anyway, I wouldn't buy one of these. My AC bill is already high enough without adding a 1500 W space heater. Back when I lived in a cold climate, I used 2 1500 W oil-filled radiant heaters to heat quite a bit of my house. If I were to use a 1500 W system in California, I would need to rig up some fans and ducts to blow the hot air out the window.

I decided to save that discussion for another time as it would really have complicated things. We wanted to find the CrossFire performance factor without mudding it up with Mantle stuff that may not be perfected yet.

1261W at the wall times the ~89% efficiency gives 1122W real consumption that is about 93,5% of it's capacity that should not be the slightest problem for a PSU of this ones quality! (There are plenty of other PSUs that I would not even consider buying, but this one SHOULD handle this setup 24/7 at max load for years!)

The r9 295X2 has some pretty stringent power restrictions to it. Another website I know of did this with a 1350W PS and still had issues. These two cards together, with their unique specifications, really need a 1500W PS.

Well in theory 1200watt PSU would be enough, but not everyone has the same setup, some machines pull more power cause cpu used and even how many hdd's they have. Point being should get a bit more beefy power support then 1200 watts if you plan to run 2 of these cards. Also comes in to play is the circuit the computer is on in your house but that is another matter.

My point was, whether there was an issue, or they just went with 2 PSUs based on the [H]'s experience... Don't take me wrong, Enermax is not bad, but I would not consider it as an option for myself :P... This particular Corsair however is a different cup of coffee...

In Europe, we do not have (huge) limitations on power in flats... The default here is 16A per 230V circuit and a 25A common fuse in front of it(them), so in theory a 3.5kW PSU (the actual legal limit on a single phase appliance power in a home) would be doable and the lights and maybe a TV would still be on in the flat :D

I wanted to follow up a previous comment someone posted asking about three way with a 295X2 and a 290X. I suppose theoretically that should work but would it actually work in real world gaming applications?

I would love to see a quick setup with that configuration and how it scales.

Given the weak scaling of 2 x 295X2s in some games, it might make more sense just to setup the 295X2 with a 290X for extreme performance and a little less price and power consumption.

lol this card is killing nvidia xD.
they delayed the TitanZ trying to figure out a way to beat it.
when im guessing everything was ready, R&D done, the cooler probably bought and even mounted on the pcb, packagings... thats alot of money wasted for last minute delay to rework all that, the TitanZ perf must have been ridiculously low compared to the R9 295x2, to accept to lose so much money.

probably yes, but then the Nvidia brand would take a deadly blow, that will go from highest perf GPU, to the most greedy margins scammers.
Nvidia have been making alot of mistakes these last couple years, and very bad ones at that, from G-sync, to this stupid luxury new brand Titan, that failed in 3 consecutive instances, and AMD isnt making things easy for them, from Mantle, freesync,Bundles to R9 295...just imagine, that if AMD didnt bring out this card, we would have today a Nvidia card 20-30% less performant at twice the price, i am really glad that AMD is sticking into desktop GPU market otherwise Nvidia would have sucked us out dry.
they really should give it up and go back to GTX, because honestly i dont see a place for this card in the market, it would just hurt them.
because if they do like AMD with watercooling that would hurt their image for copying AMD, if they stay on air they have to find some ingenius way to keep it stable while overclocked, which ii doubt very much.
the most likely is that they will put a good cooling that wont be able to keep the frequency, but then pay some sites to bench with cool cards before it warms up, and they also have to drop the price, because AMD planned on the 3000$,
with the hashtag #2betterthan1, witch comparaison for same price is crossfire r9 295x2, so the price need to be lower than 2k$, otherwise AMD would just drop the price to match the 1/2 ratio

and ofc "nuff said" is a sound counter argument, that's what cracks me up about ppl like you, if i was deluded you would state why unless ofc you have nothing to say without sounding ridiculous...nuff said lol

That Corsair AX1500 would have come in handy. Thats a lot of gpu power you got there Ryan, epic review. At least AMD has made great improvement in their drivers to support 4 Gpus, although still not perfect.

Now, im still gaming on a 1680x1050 monitor, and your on 4k, you have the best job Ryan!

hi bro i just have a question is possible make one tri crossfire of 295x , so i do ear some over the new standard CrossFireX XDMA was possible make a cross with more of 4 GPUS .. cheers i just have that doubt