Impact of PCI-E Speed on Gaming Performance

Impact of PCI-E Speed on Gaming Performance

Warning:Always look at the date when you read a hardware article. Some of the content in this article is most likely out of date, as it was written on November 13, 2013. Check out our more recent articles.

CPUs and chipsets only have a certain number of PCI-E lanes (or "paths" for data transfer), but with the number of features and options that manufacturers are cramming onto modern motherboards, there are times when there are simply not enough PCI-E lanes to run everything at full speed. We see this most often with the mainstream series of CPUs (currently Haswell) and chipsets (Z87, H87, etc.) where if you want to use multiple video cards (or install any other type of PCI-E card in the second x16 slot), the video card(s) will only run at x8 speeds rather than the full x16 speed you would otherwise get.

Normally, if you want to run a pair of video cards at full x16 speeds, you need an enthusiast motherboard and CPU (such as an Ivy Bridge-E CPU on a X79 motherboard). The problem with this is that NVIDIA video cards currently do not officially support PCI-E Gen3. You can run an .exe that NVIDIA has provided to gain Gen3 capabilities, but NVIDIA does not offer any guarantee of stability if you use that utility.

Many hardware sites (such as TechPowerUp and AnandTech) have shown in the past that most video cards do not show any performance decrease by running in x8 mode and cannot utilize the larger bandwidth provided by the latest Gen3 specification. However, video cards are getting faster and faster so we felt it was worth revisiting to find out if the fastest video cards available still do not have any performance advantage running at PCI-E x16 Gen3 versus PCI-E x8 Gen2. In addition, multiple GPU setups have not been throughly tested and with the gaining popularity of 4k displays, we felt it was important to see if the PCI-E revision/speed would affect a dual GPU setup at the much more demanding 4k resolution.

We will be using three different benchmarks at two different resolutions: 1080p and 4k. At 1080p we will only be using a single GTX Titan, but at 4k we will be using a pair of GTX Titans in SLI. The settings for the benchmarks were chosen to give us an average 40-50 FPS.

To test both PCI-E 2.0 and PCI-E 3.0, we simply changed the PCI-E revision setting in the BIOS to either Gen2 or Gen3.

Finally, to switch between x16 and x8 modes a piece of insulating material (actually the sticky part of a post-it note) was used to cover half of the contacts on the cards which forces it to run in x8 mode. Note that we did not have to do this on the Z87 motherboard when we used two GTX Titans in SLI as that CPU/chipset can only run multiple video cards at dual x8 speeds, not dual x16.

Starting with Unigine Heaven 4.0, lets first take a closer look at the Z87 test system. On that system, PCI-E 3.0 was very slightly faster than PCI-E 2.0, but oddly we saw higher scores in x8 mode than in x16 mode. The biggest variance was still only 1 FPS, however, which we would call within our margin of error.

On the X79 system, there is really nothing to discuss. The biggest variance was only .4 FPS which is well within our margin of error.

Our second benchmark - Hitman: Absolution - has some very interesting results. This time, the Z87 test system showed pretty much no performance difference across any of our PCI-E combinations. On the other hand, the X79 system gives us very mixed results. In x16 mode, PCI-E 2.0 is faster than PCI-E 3.0, but only by .6 FPS. But in x8 mode, PCI-E 3.0 is 1.2 FPS faster than PCI-E 2.0. 1.2 FPS is still not very much, but it is noteworthy.

For our final benchmark - Metro: Last Light - our results are again unremarkable. On the Z87 test system, the results were all essentially identical. The X79 system was slightly faster when running PCI-E 3.0 in both x8 and x16 mode, but only by .3-.7 FPS.

Running our benchmarks at 4k resolutions with two video cards should stress the PCI-E bus more than a single card at lower resolutions, but at least for Unigine Heaven there is no noteable difference in performance based on either the PCI-E revision or the number of PCI-E lanes.

Unlike Unigine Heaven 4.0, Hitman: Absolution does give us some interesting data to go over, at least for the X79 test system. On that system, at x16 speeds PCI-E 2.0 is actually 1.2 FPS faster than PCI-E 3.0. At x8 speeds, however, PCI-E 3.0 is actually much faster than PCI-E 2.0 by 1.5 FPS which is the biggest variance we saw in any of our tests.

Our final benchmark is actually the first one that gave us the results we originally expected. The Z87 system again shows no performance variance, but for the X79 system PCI-E 3.0 is either faster or the same as PCI-E 2.0 at both x8 and x16 speeds. At the same time, x16 outperforms x8 by as much as 1.5 FPS.

Our testing has pretty clearly shown that for gaming using either PCI-E 2.0 or PCI-E 3.0 will give you nearly identical performance. Oddly, in some benchmarks PCI-E 2.0 was actually faster than PCI-E 3.0. At the same time, x16 was not consistantly faster than x8. Again, x8 was actually faster than x16 in many cases. So unless you care about getting up to 1.5 FPS better performance, you might actually want to manually set your video cards to operate at x8 speeds - although we really would not recommend doing so.

This isn't to say that PCI-E 3.0 is not faster than PCI-E 2.0, or that x16 is the same as x8, but rather that current video cards and games are simply not able to utilize the additional bandwidth they provide. In fact, we recently showed that the performance of a Xeon Phi card is greatly reduced if you run it at x8 speeds in the blog post Performance of Xeon Phi on PCIe x8.

While we recommend using the latest PCI-E revision whenever possible, if your motherboard or video card only supports PCI-E 2.0 our results show that this really is not a problem. At the same time, if you want to install a sound card into your Z87 system but doing so would limit your video card to x8 speeds, that is also not a very big problem. At most you may see ~1.5 FPS drop in performance, but that change is so small that it is very unlikely to ever be noticeable.

There needs to be more of these articles. I appreciate the information Matt. Personally, using this ASUS Sabertooth 2.0 Mobo....I love having options. I'm going with PCI 2.0 whenever given the option. Who knows how long it will be before games can actually and fully utilize all the extra bandwidth these new cards are bringing to the table.

Posted on 2013-11-13 23:54:27

Malcolm Galloway

Touche'.... (Clappin' hands)

Posted on 2013-11-18 21:42:09

Sybreed

My motherboard is using PCI 2.0 and I recently bought an NVIDIA 780GTX TI... everyone I knew told me I was practically buying this for nothing and that I would not benefit from my new graphic card at all. I'm glad to know it isn't entirely true and that I won't have to build an entirely new computer from scratch! It's especially crazy considering I got my computer merely 2 years ago.

Posted on 2013-12-03 22:30:04

Sofia Lucifairy

Very interesting post!

Posted on 2014-03-28 20:08:32

lowrizzle

Just curious, but have you considered that the PCH does not contain a discreet PCIe 3.0, 2.0, 1.1 and 1.0 stack? Why would a firmware engineer include all three when each downlinks as backwards compatible?

In other words, how did you validate that you were actually on a 3.0 versus a 2.0 link? By setting the BIOS option? I think your nearly invariable results are actually the result of an incomplete testing methodology.

Posted on 2014-04-28 20:09:53

Robin Kleven

MY question is, when does the bottle necks appear. How much longer will 8x keep up?

Posted on 2014-06-23 21:06:22

gadgety

This is a great piece. I'd like to see you revisit it annually, because, as you say GPUs are getting faster and faster. There are dual cards as well, R9 295x2 from AMD, and similar solutions from Nvidia.

Posted on 2014-08-16 07:51:58

AquaVixen

I do wish you would of tested to see what the effect x4 does on things. Some motherboards that run both gpu's at dual-8x will drop 1 or more cards down to 4x if you add other pci-e devices. That would also be interesting data.

Posted on 2014-08-22 15:16:54

Papizach Akbar

agree with the statement gpu would scale onto cpu, I run 2 gtx580 in asus Rampage4 formula for final render, save me seconds and even minutes when I replaced i7 3820 to i7 4930K. Im planning to add titan black and beefier psu for animation works. Any1 here knows link that shows how it goes in 3d apps rather than gaming..

Posted on 2014-09-02 13:36:12

Edward Casey

I am very pleased to have read this test data. I am thinking about getting an Asrock Z97 Extreme6 mother board, paired up with an Samsung XP941 M.2 x4 card, but putting it in that slot drops the x16 slot for the video card to x8. I do think that combination will give me very fast boot times using Windows 8.1. I am relieved that there will be no throttling of the performance of the graphics card for all intents and purposes running in x8.

Posted on 2014-09-10 21:55:26

Marcio

Thank you so much for this information. i have a Z98 motherbord 2600k. altho the G1 Sniper supports pci 3.0 the 2600k cpu doen't. and i'm looking at maybe going for a gtx 970 sli configuration. This information was is very useful. thank you once again.

Posted on 2014-09-24 20:45:36

Marcio

sorry Z68 motherbord.

Posted on 2014-09-24 20:50:14

jim greene

So it is almost a year after this article was written.PCIe 3.0 is more common on mobos.h97 and z97 chipset is out. (True, not much change from 87 series)New, more powerful video cards are out (ex. r9 290 and gtx 900 series).Does a higher end gaming machine push the limits yet of what the mobo buses (expansion slots) can handle?

I guess you put the gtx 970 in one of the PCIe 3.0 x16. Where does the 7200 SATA III HDD and the SSD attach?

There is lots of talk about how the PCIe widths available to individual GPUs are cut, but not too much talk about when other devices are added to the mobo. In my above example, there are three devices to attach, only on is a GPU.

How will you put these three devices on the mobo? What will be the resultant theoretical throughput for each (x16, x8, x4, x1)? Do any realize an actual bottleneck?

If the above does not push limits, what if I choose 2 top GPUs and a HDD and a SSD?(that would be getting a Serenity now and adding a like GPU later.)

Posted on 2014-10-30 02:31:06

robt

Interesting; In a similar vein, I thought in the case of having a PCI-E 2.0 motherboard for my 2.0 x16 card would show a big improvement over running the 2.0 x16 card in a PCI-E 1.1 motherboard. I was surprised to see no difference and wondered if the settings were bad or I had power supply problems. Thanks for the insight.I am assuming the relative difference in 2.0 to 1.1 is the same as 3.0 to 2.0.

Posted on 2014-11-05 21:30:20

kanuj

Very interseting article hats off to pugetsystem now i am great fan :)

It is certainly aninteresting hypothesis to test, but I'm not quite sure how theyexpected some last generation game-engine powered games running onsystems with their total video memory(dedicated + physical VRAM) tobe in excess of 9GB (on the non-SLI titan configuration) to provideany type of bandwidth issue for these systems; whether in Full HD or4K.

I would have thoughtthat better test conditions would come from using the Unreal Engine 4SDK rendering at extreme settings and using high-end video cards justabove the Steam average (something like a GTX 660), but certainly notusing cards like Titans that could store an entire game section inVRAM, as the test really needs to have a workload that generatesplenty of contention for VRAM on a second by second basis, so thatthe PCIe bus is thrashed at each speed and could illuminate resultsinline with the conditions the average steam user will face becausetheir setups will face high contention for available VRAM whengaming, with these games, and the coming games, as once again, thegames will be designed around the strengths of the games consoles,and this generation the consoles' strengths are HSA and high memorybandwidth.