AMD Radeon Vega Frontier Edition CrossFire Testing

Two Vegas...ha ha ha

When the preorders for the Radeon Vega Frontier Edition went up last week, I made the decision to place orders in a few different locations to make sure we got it in as early as possible. Well, as it turned out, we actually had the cards show up very quickly…from two different locations.

There of course needs to be some discussion up front about this testing and our write up. If you read my first review of the Vega Frontier Edition you will clearly note my stance on the idea that “this is not a gaming card” and that “the drivers aren’t ready. Essentially, I said these potential excuses for performance were distraction and unwarranted based on the current state of Vega development and the proximity of the consumer iteration, Radeon RX.

But for multi-GPU, it’s a different story. Both competitors in the GPU space will tell you that developing drivers for CrossFire and SLI is incredibly difficult. Much more than simply splitting the work across different processors, multi-GPU requires extra attention to specific games, game engines, and effects rendering that are not required in single GPU environments. Add to that the fact that the market size for CrossFire and SLI has been shrinking, from an already small state, and you can see why multi-GPU is going to get less attention from AMD here.

Even more, when CrossFire and SLI support gets a focus from the driver teams, it is often late in the process, nearly last in the list of technologies to address before launch.

With that in mind, we all should understand the results we are going to show you might be indicative of the CrossFire scaling when Radeon RX Vega launches, but it very well could not. I would look at the data we are presenting today as a “current state” of CrossFire for Vega.

Installing and enabling CrossFire with our Radeon Vega Frontier Edition hardware was a simple as would expect. The current driver from AMD’s website was used, and in both the Game Mode and the Professional Mode, the CrossFire option exists under the Global Settings.

We only had one hiccup in our testing in terms of stability with Rise of the Tomb Raider – but the issue seemed related to our Frame Rating overlay application. While the application was running fine without this overlay in CrossFire mode, we require the overlay to measure performance accurately using our capture methodology. Because the capture methods of our performance analysis are even more important when evaluating multi-GPU performance (where anomalies are more common), I decided to leave out RoTR results than report potentially inaccurate scores.

What about any DX12/Vulkan Explicit Multi-GPU adaptor testing? Are there any benchmarks that can test DX12's/Vulkan's Explicit Multi-GPU adaptor and GPU load balancing managed fully under the control of the DX12/Vulkan graphics APIs.

My hopes are that Both DX12's and Vulkan's API managed, via the games and graphics software's calls to the graphics APIs, will see better multi-GPU results with everybody developing future games/graphics/other applications that target Vulkan's and DX12 Explicit Multi-GPU adaptor features where the developers/entire industry have full GPU load balancing control.

Also please keep us updated when AMD issues any new driver updates for Radeon Vega FE by doing a new round on benchmarks.

Thanks for the reviews, and please try to work some more Blender rendering benchmarks that test total render times and not necessarly any FPS metrics.

yes. no. it should be better than regular crossfire at least. it depends on them designing the GPU with that idea in mind. Also apparently latency is a bigger issue for GPUs - according to random person on the internet. IF should have higher latency so that the 2 GPUs won't really be 1.

The Infinity Fabric(IF) will be used across all of AMD's products as it is on the Zeppelin Die(Used by AMD on its Ryzen/Threadripper/EPYC) SKUs for the 2 CCX units on the Zeppelin die to communicate via that IF coherent fabric. The Infinity Fabric also extends beyond the 2 CCX units on the Zeppelin die to the other CCX units on other Zeppelin dies on the same MCM, as well across the socket(On 2P Epyc systems) to the Zeppelin dies and their CCX units on the other Epyc CPU. ThreadRipper will support 2, or more, Zeppelin dies on an MCM the same as Epyc but ThreadRipper will probably only support 4 memory channels per MCM while EPYC will support 8 memory channels(2 per Zeppelin die) per chip/MCM.

This Infinity Fabric(In a similar method as Nvidia's NVLink) can be used to interface an EPYC chip up to a Vega GPU(That also uses the Infinity Fabric coherence protocol). So coherent communication can happen between Epyc CPUs and Vega GPUs via the IF, Ditto for any Vega GPU to other Vega GPU/s coherent communication via the IF.

Don't forget that AMD is also a founding member of OpenCAPI. So that IBM lead foundation's members(AMD/Others) will have their products able to interface up with IBM's power9s and any third party Power9 licensees' power9 CPUs(Google/Others that license Power9). So OpenCAPI is derived from IBM's CAPI(Coherent Accelerator Processor Interface) IP that is now open and called OpenCAPI and AMD will offer support for OpenCAPI on AMD's GPU accelorator/AI SKUs.

Coherency is more to do about Cached data/code being moved around by cache controllers across CCX units, processors(CPUs, CPUs, DSPs, etc.) sockets, or even PCIe cards. So the Infinity Fabric as well as OpenCAPI protocols over various processor fabrics via the respective processors' cache controllers that speak Infinity Fabric/OpenCAPI/other coherent protocols.

AMD's Infinity Fabric encompasses both a CPU/GPU/other processor control fabric/associated dedicated fabric and a separate data fabric where the various processors CPUs/GPUs/Other processor’s cache and memory controllers that have the Infinity Fabric IP included in their hardware. The Infinity Fabric can via processors cache controllers manage cache/data coherency across any type of processor with the Infinity Fabric IP included.

So with say and Epyc CPU can have some form of cache coherency traffic Epyc CPU’s cache directly to Vega GPU’s cache that is managed over that Infinity Fabric IP that is on both Zen/Epyc CPUs and Vega GPUs or any other processor IP that AMD decides on, such as FPGAs, DSP, etc. So for all points and purposes AMD’s infinity Fabric can create and APU type of arrangement between any Epyc CPU and any direct attached discrete Vega based GPU with the Zen cores on the Epyc SKUs able to communicate CPU controller to GPU/other processor cache controller and pass data/code in a more direct way without any secondary trips to and from slower memory.

The connected SSD(on older GCN and Vega GPUs) on PCIe card with which you refer is more being handled by each respective processor memory controller and what ever virtual memory page table/VM memory swap IP is native to each GPU’s and CPU’s respective memory controllers. That SSD is the last level of VM swap space on, for example, Vega at the bottom tier with the main system memory being the next level up the memory/paging storage hierarchy. The next level above is, on Vega, the HBM2(HBC)/Cache(On Vega SKUs) that effectively is treated like a last level cache by the Vega GPU’s HBCC and associated Cache subsystems above. The HBCC is a direct client of the L2 cache on Vega so there is efficiencies there with keeping things focused and in the L2 rather than swapped out to any of the lower and more latency inducing cache/memory levels.

The Infinity Fabric would be more to allow any Epyc CPU coherency traffic to bypass the lower levels of memory and transfer to Vega’s HBC, or directly to Vega’s L2, any cached data directly from any Epyc CPU cache directly to any Vega cache level where there is processing work(FP, INT, Other values) being dispatched from the Epyc CPU to the Vega GPU that both CPU and GPU are working on. It could even be that there is no data movement at all cache to cache but just some coherency signaling Zen/Epyc CPU to Vega GPU that invalidates a some data held in one processors cache because that data is now out of date and has been changed and needs to be flushed so the proper update data can be fetched or transferred over. This also applies to any Vega GPU to Vega GPU Cache/coherency trafic, or even Vega to DSP, If the DSP speaks the Infinity fabric protocol.

that's silly. obviously 2 x vega fe would walk up to a 1080 ti and slap it in the face. then proceed to slap every single member of its family in the face.

the difference in power would be too enormous.

this is all weird upon weird and I don't see anyone doing any meaningful investigations or comparisons. that opportunity will be gone once AMD does launch rx vega. pc hardware media be disappointing me.

At the moment VEGA drivers are "not-ready". GPU is not utilized, FP16 not used (yet), tiling etc. You can try and check with new CodeXL profiling and/or DX12 PIX. They use like ~65% VEGA cores (FP32 data path). AMD should be ~1080TI when they will hit 95%+ and much faster if FP16 can be used for some tasks. Lets just hope that their tools/drivers teams are working hard, and that we devs can get their hands on VEGA FE, so that we will have nice product and games optimized for it (some are already on the way).

There are places where game devs can use it. Also AMD can do this optimizations in drivers or actually shader compiler. Both NV and AMD already do this for most high profile games/apps for their other cards - it's normal. For VEGA not yet, and as for devs, they will also do it, but it will take time and learning. And not all devs do it. Some just stop and deploy app when it can hit 90/60/30 fps on selected target cards/systems. That's the real world. It would be awesome if reviewers would be able to see perf counters and measure not only fps, frame time in ms but also utilization of various parts of gpus. There is so much left on the table... that kind of reviews maybe would push both - gpu vendors and devs to write better games/apps.

problem is our current games is more complex that how it was before (like ten year ago). some graphical effect simply needs to be done in FP32 according to developer. the last time AMD try using FP16 they can't really do it without affecting the image quality of the entire scene. and they only enable them in farCry benchmark mode to get better score and disable them when you actually play the game.

mixing FP32 and FP16 together will need more attention to the optimization or else there is no saving at all by going FP16. the only question is with triple A games release that always being rushed by publisher will developer have the time for it?

"...and you can see why multi-GPU is going to get less attention from AMD here."

This could NOT be further from the truth. AMD designed MANTLE to scale gpu's natively. It is called Explicit Multi Adaptor. In fact AMD supports up to 4x GPU cards in Mantle and Microsoft followed with their Mantle Clone DX12 and of course Vulkan does as well.

Crossfire is ONLY necessary using the obsolete API DX11.2 as multi-cards are natively supported in DX 12, Mantle and Vulkan with Explicit Multi Adaptor.

In Dx12 the scaling of Radeon AIB is virtually 1:1. Two cards will just about double the performance of one card with DX12 supported games. Two RX 480 running DX12 equals or beats GTX 1080 for far less $$$$.

By Christmas 2017 90% of ALL new titles will support DX12.

Why is the OBSOLETE DX11 even a consideration? Are enthusiasts going to spend $1800+- for a 2x multi gpu system just so they can run DX11 Legacy supported games?

Why not see how well it runs DX9 and DX10 games while you are at it.

Talk about irrelevant.

Unless of course you wanted to show VEGA in a poor light. nVidia does not bench well using DX12.

nVidia does not support Asynchronous Compute and Asynchronous Shader Pipelines except through software or driver emulation. Asynch Compute is AMD hardware IP.

So I challenge you to benchmark VEGA using DX12, and Mantle and VULKAN and DO NOT DISABLE ASYNCH COMPUTE. ALso you might want to use 3dMark DX 12 benchmarks as well as Star Swarm.

Who in their right mind spends $1800 on two Open CL workstations card to play games, they are for work, (yes some of us still do it) rendering 10bit video etc. I will used them as such in my Mac Pro. If I want to pay games I will use my Windows PC, as windows OS is best for Gamers and Secretaries atm (Until more Vulkan API games hit the market) .

Now AMD have looked to the future, they realised long ago that the thermal threshold of a CPU core was around 5GHz, and that DX11 was only using a single core was a huge limiting factor for gaming, holding performance back, so they gave us Multi core CPU's, developed Mantle low level API, that spawned DX12, and Vulkan. And built GPU's that could perform asynchronous compute tasks to take advantage of the CPU's and API's, The only thing holding back performance is developers poor application and optimisation of the aforementioned API's.

Some of you people seem to want to hold back innovation, and play on single core DX11 for the next 20 years. we need to reward the game developers that move us forward with Vulkan, it's a cross plaform API that should be the API for all future games.

No one is asking pcper to reinvent the wheel, but why on earth would anyone review a crossfire setup with games known to not use it well. Test AC:unity etc etc. So many games actually benefit from it. I appreciate the effort and sorry for my tone, but this was useless. Why not sell one of the Vega FE and use the cash on updating your library with some actual relevant games.

Thank you Ryan, i knew that you did trianglebin test. I hoped you dived a little bit deeper at the Rasterizer behavior.

I have seen in another post that you have asked some expert which mean that the improvement of Tiled Based Rasterizer is 10%.

I'm a little bit surprised. Nvidia hat an ipc improvement of 35% between Kepler and Maxwell with the tiles based Rasterizer.

If you think about it. You save double Performance with TRB. You don't argue the shader with unimportant workload. Because of this you get also capacity back from the shader. The shader wich done unimportant work before are now free to do important work.

Also did you remember your article about Deus x and the 220Million Triangles where are only 2 billion are viewed.

"I'm a little bit surprised. Nvidia hat an ipc improvement of 35% between Kepler and Maxwell with the tiles based Rasterizer."

majority of the performance improvement coming from the rearrangement of the SM. nvidia already explain this when they first coming out with 750ti. for TBR in maxwell we don't even know about them until David Kanter made his test last year.

TBR probably can improve performance but not in the way that some people imagine. if TBR is superior then why ATi and nvidia did not use them before? Imagination Technologies for example have been using some form of TBR (when others not) even when they still competing in desktop market more than a decade ago so why they did not the best GPU maker on desktop right now?

nvidia only discover the importance of TBR when they try competing in the mobile market with tegra. TBR is more common on mobile GPU because of power and bandwidth constraint. but it did not dramatically increase GPU performance (in term of FPS) like some people believe.

Nice, I have not seen any OpenGL games reviewed on Vega FE yet given the good R15 scores it would be an interesting test to highlight if possibly the DX/Volcan drivers are just not up to scratch yet. Doom has an OpenGL mode does it not? Normally the Volcan mode significantly outperforms the OpenGL mode (on both Nvidia and AMD cards) so would be an interesting test to see if more work has gone into the OpenGL driver stack for the (proish) card.

Go to the Phoronix website as that's where the Majority of the OpenGL Linux games testing is done. And Michael Larabel very often tests OpenGL's performance on Linux against OpenGL's performance on Windows for games and other graphics software.

Michel has been remote testing via Linux on a Phoronix reader's Vega FE using the Phoronix Test Suite. There is some OpenCL testing and maybe Michael will get more remote access for OpenGL/Vulkan testing.

Both Vulkan and DX12 and the graphics APIs managed Explicit Multi-Adaptor for GPU and the games developers/Gaming Engine developers can create their own optimized Libraries that are optimized for mult-GPUs via the games/gaming engine.

CF/SLI are not so good at multi-GPU load balancing inside of AMD's or Nvidia's respective drivers. So get the multi GPU load balancing out of the drivers and into the APIs and let the entire gaming/graphics software industry optimize for multi GPU usage. Keep the drivers as simple, light weight, and close to the GPU's matal as possible and let the games developers do multi-GPU via the games/gaming engine's SDKs that can call on Vulka's or DX12's EMA. That way the entire gaming, and graphics software industry, can pool their resources and get the mulit-GPU scalability issues solved.

This was a really cool review. Kudos to PCPer for doing this. I appreciate it. I would have liked to see 4-way 480 or 4-way 580 when they came out because they were cheap and it was possible and that would have been cool too. This kind of testing isn't to suggest that people should actually go out and buy it, but just cool too see for testing if you're into computer hardware. This is why I follow PCPer.

I feel like you may not have updated Hitman to the latest version. On my copy, the settings menu has an extra option (enable multi gpu - dx12) and the mgpu only works if you have that option enabled. The only reasons you wouldn't see that option are 1. Not updated Hitman fully or 2. Mgpu disabled in the driver somehow.

In case you haven't heard this response to the other eleventy bazillion times someone has said that - this IS the same silicon as RX, running a gaming driver, and the according to Nvidia 'prosumer not gaming' Xp this prosumer-not-gaming card is positioned against is nevertheless bloody good at gaming. FE's clocks look like being lower than RX, and the drivers are clearly unoptimised, as this review shows: Vega crossfire right now is completely AWOL

Damn..... this is truly to bad to be honest. I was hoping for something a bit more "special" than all of this. Mainly in the performance of these cards, but especially when you read an "excuse" before the actual review. Some nonsense about how MGPU scaling is an abysmal thing and fading to the wayside nonsense.... Seriously couldn't be more from the truth.

Damn API's are/have been created with MGPU in mind as of late and are supposed to get better....New motherboards still offering up to six x8-x16 PCI-E capable slots.

I believe it to be the laziness of late and or the coercing of the developers.

There's nothing truly better than being a PC gamer and being able to put more money into something to get a beneficial gain. Every FPS can be an advantage....

I'd love to see frame times where the GPU is not thermally or power limited (for either side). I found hitting a limit destroys frame times on my 1080 SLI. I'm curious how many spikes are caused by hitting limits vs. actually inherent to SLI or crossfire.

While the design is similar, you have to look at intended use case. Vega FE and RX Vega are going to be 2 different graphics cards. Vega FE is a pro level card you can play games on. RX Vega is going to be a top performing gaming card. I'm not saying I have any info anyone outside of AMD doesn't, but I'm looking forward to see what RX Vega is actually capable of.

I am hoping to buy the Vega FE for my 2009 Mac Pro for editing with FCPX, we all know a Mac running FCPX destroys any Windows PC's on render times, Im currently running a Maxwell Titan X as they have good open CL performance, and I have upgraded my Cameras to record 10 BIT 4.2.2 This plays perfectly smooth in my 8 1/2 year old Mac Pro at 4K even before its optimised to Pro Res. so have been hoping to see some real open CL tests of this workstation card.

Everyone seems to be testing it for gaming, wtf, Its like buying a Van and testing it against hot hatch's, and worse still testing it with DX11 API, the worst API ever, anyone coding for DX11 in 2017 needs shooting, DX11 uses single core, how many people have dual core CPUS now.

I currently run 4 networked Computers, a 12 core (24 HT) heavily modded 2009 Mac Pro, 8 Core (16HT) Full open loop water cooled 5960X OC 4.4 GHz X99, 4 Core (8HT) 4790KOC 4.6GHz 4 core (8HT) 2600K OC 4.4GHz, They are all networked and Connected to multiple 50' and 65' Panasonic 4K THX certified smart TV's, that have 99.3 % of the sRGB colour gamut at default Also have an ASUS MG279Q 2KIPS 144Hz gaming monitor, for times when I'm not working.

But the AMD VEGA Frontier Edition used with Final Cut Pro will be a monster, for us professional film makers and broadcasters. we know Apple plan to use Vega graphics in the IMAC PRO, scheduled for release in December, running the latest High Sierra OS, so looks like the drivers will be there to run it on my 2009 Mac Pro, with High Sierra OS as well.

Enjoy your gaming, I don't really game that much, but please test the Vega on DX12 and Vulkan games, it's meant to work with, not the poorly optimised DX11 API, and that new Tomb raider title is a disaster, whoever pretended they coded that for DX12 should give the money back and never work again. there is a distinct difference between something working with DX12 and being Optimised for it.