A Holiday Project

A couple of years ago, I performed an experiment around the GeForce GTX 750 Ti graphics card to see if we could upgrade basic OEM, off-the-shelf computers to become competent gaming PCs. The key to this potential upgrade was that the GTX 750 Ti offered a great amount of GPU horsepower (at the time) without the need for an external power connector. Lower power requirements on the GPU meant that even the most basic of OEM power supplies should be able to do the job.

That story was a success, both in terms of the result in gaming performance and the positive feedback it received. Today, I am attempting to do that same thing but with a new class of GPU and a new class of PC games.

The goal for today’s experiment remains pretty much the same: can a low-cost, low-power GeForce GTX 1050 Ti graphics card that also does not require any external power connector offer enough gaming horsepower to upgrade current shipping OEM PCs to "gaming PC" status?

Our target PCs for today come from Dell and ASUS. I went into my local Best Buy just before the Thanksgiving holiday and looked for two machines that varied in price and relative performance.

The specifications of these two machines are relatively modern for OEM computers. The Dell Inspiron 3650 uses a modest dual-core Core i3-6100 processor with a fixed clock speed of 3.7 GHz. It has a 1TB standard hard drive and a 240 watt power supply. The ASUS M32CD-B09 PC has a quad-core HyperThreaded processor with a 4.0 GHz maximum Turbo clock, a 1TB hybrid hard drive and a 350 watt power supply. Both of the CPUs share the same Intel brand of integrated graphics, the HD Graphics 520. You’ll see in our testing that not only is this integrated GPU unqualified for modern PC gaming, but it also performs quite differently based on the CPU it is paired with.

Introduction

In August at the company’s annual developer forum, Intel officially took the lid off its 7th generation of Core processor series, codenamed Kaby Lake. The build up to this release has been an interesting one as we saw the retirement of the “tick tock” cadence of processor releases and instead are moving into a market where Intel can spend more development time on a single architecture design to refine and tweak it as the engineers see fit. With that knowledge in tow, I believed, as I think many still do today, that Kaby Lake would be something along the lines of a simple rebrand of current shipping product. After all, since we know of no major architectural changes from Skylake other than improvements in the video and media side of the GPU, what is left for us to look forward to?

As it turns out, the advantages of the 7th Generation Core processor family and Kaby Lake are more substantial than I expected. I was able to get a hold of two different notebooks from the HP Spectre lineup, as near to identical as I could manage, with the primary difference being the move from the 6th Generation Skylake design to the 7th Generation Kaby Lake. After running both machines through a gamut of tests ranging from productivity to content creation and of course battery life, I can say with authority that Intel’s 7th Gen product deserves more accolades than it is getting.

Architectural Refresher

Before we get into the systems and to our results, I think it’s worth taking some time to quickly go over some of what we know about Kaby Lake from the processor perspective. Most of this content was published back in August just after the Intel Developer Forum, so if you are sure you are caught up, you can jump right along to a pictorial look at the two notebooks being tested today.

At its core, the microarchitecture of Kaby Lake is identical to that of Skylake. Instructions per clock (IPC) remain the same with the exception of dedicated hardware changes in the media engine, so you should not expect any performance differences with Kaby Lake except with improved clock speeds.

Also worth noting is that Intel is still building Kaby Lake on 14nm process technology, the same used on Skylake. The term “same” will be debated as well as Intel claims that improvements made in the process technology over the last 24 months have allowed them to expand clock speeds and improve on efficiency.

Dubbing this new revision of the process as “14nm+”, Intel tells me that they have improved the fin profile for the 3D transistors as well as channel strain while more tightly integrating the design process with manufacturing. The result is a 12% increase in process performance; that is a sizeable gain in a fairly tight time frame even for Intel.

That process improvement directly results in higher clock speeds for Kaby Lake when compared to Skylake when running at the same target TDPs. In general, we are looking at 300-400 MHz higher peak clock speeds in Turbo Boost situations when compared to similar TDP products in the 6th generation. Sustained clocks will very likely remain voltage / thermally limited but the ability spike up to higher clocks for even short bursts can improve performance and responsiveness of Kaby Lake when compared to Skylake.

Along with higher fixed clock speeds for Kaby Lake processors, tweaks to Speed Shift will allow these processors to get to peak clock speeds more quickly than previous designs. I extensively tested Speed Shift when the feature was first enabled in Windows 10 and found that the improvement in user experience was striking. Though the move from Skylake to Kaby Lake won’t be as big of a change, Intel was able to improve the behavior.

The graphics architecture and EU (execution unit) layout remains the same from Skylake, but Intel was able to integrate a new video decode unit to improve power efficiency. That new engine can work in parallel with the EUs to improve performance throughput as well, but obviously at the expensive of some power efficiency.

Specific additions to the codec lineup include decode support for 10-bit HEVC and 8/10-bit VP9 as well as encode support for 10-bit HEVC and 9-bit VP9. The video engine adds HDR support with tone mapping though it does require EU utilization. Wide Color Gamut (Rec. 2020) is prepped and ready to go according to Intel for when that standard starts rolling out to displays.

Performance levels for these new HEVC encode/decode blocks is set to allow for 4K 120mbps real-time on both the Y-series (4.5 watt) and U-series (15 watt) processors.

It’s obvious that the changes to Kaby Lake from Skylake are subtle and even I found myself overlooking the benefits that it might offer. While the capabilities it has will be tested on the desktop side at a later date in 2017, for thin and light notebooks, convertibles and even some tablets, the 7th Generation Core processors do in fact take advantage of the process improvements and higher clock speeds to offer an improved user experience.

What's new and what's not

While spending time learning about upcoming products and technologies at the Intel Developer Forum earlier this month, I sat down with the company to learn about the release of Kaby Lake, now known as the 7th Generation Core processor family. We have been seeing and reporting on the details of Kaby Lake for quite some time here on PC Perspective – it became a more important topic when we realized that this would be the product that officially killed off the ‘tick-tock’ design philosophy that Intel had implemented years ago and that was responsible for much of the innovation in the CPU space over the last decade.

Today Intel released new information about the 7th Gen CPU family and Kaby Lake. Let’s dive into this topic with a simple and straight forward mindset in how it compares to Skylake.

What is the same

Actually, quite a lot. At its core, the microarchitecture of Kaby Lake is identical to that of Skylake. Instructions per clock (IPC) remain the same with the exception of dedicated hardware changes in the media engine, so you should not expect any performance differences with Kaby Lake except with improved clock speeds we’ll discuss in a bit.

Because of this lack of change many people will look down on the Kaby Lake release as Intel’s attempt to repackage an existing product to make sure it meets a financial market required annual product cadence. It is a valid but arguable criticism, but Intel is making changes in other areas that should make KBL an improvement in the thin and light ecosystem.

Also worth noting is that Intel is still building Kaby Lake on 14nm process technology, the same used on Skylake. The term “same” will be debated as well as Intel claims that improvements made in the process technology over the last 24 months have allowed them to expand clock speeds and improve on efficiency

What is changed

Dubbing this new revision of the process as “14nm+”, Intel tells me that they have improved the fin profile for the 3D transistors as well as channel strain while more tightly integrating the design process with manufacturing. The result is a 12% increase in process performance; that is a sizeable gain in a fairly tight time frame even for Intel.

That process improvement directly results in higher clock speeds for Kaby Lake when compared to Skylake when running at the same target TDPs. In general, we are looking at 300-400 MHz higher peak clock speeds in Turbo Boost situations when compared to similar TDP products in the 6th generation. Sustained clocks will very likely remain voltage / thermally limited but the ability spike up to higher clocks for even short bursts can improve performance and responsiveness of Kaby Lake when compared to Skylake.

In these two examples, Intel compares the 15 watt Core i7-6500U (a common part in currently shipping notebooks) and the upcoming 15 watt Core i7-7500U, both with dual-core HyperThreaded configurations. In SYSmark 2014 a 12% score improvement is measured while WebXPRT shows a 19% advantage. Double digit performance increases are pretty astounding for a new generational jump that does not include a new microarchitecture or a new process technology (more or less) though we should temper expectations for other applications and workload profiles like content creation.

Clean Sheet and New Focus

It is no secret that AMD has been struggling for some time. The company has had success through the years, but it seems that the last decade has been somewhat bleak in terms of competitive advantages. The company has certainly made an impact in throughout the decades with their 486 products, K6, the original Athlon, and the industry changing Athlon 64. Since that time we have had a couple of bright spots with the Phenom II being far more competitive than expected, and the introduction of very solid graphics performance in their APUs.

Sadly for AMD their investment in the “Bulldozer” architecture was misplaced for where the industry was heading. While we certainly see far more software support for multi-threaded CPUs, IPC is still extremely important for most workloads. The original Bulldozer was somewhat rushed to market and was not fully optimized, while the “Piledriver” based Vishera products fixed many of these issues we have not seen the non-APU products updated to the latest Steamroller and Excavator architectures. The non-APU desktop market has been served for the past four years with 32nm PD-SOI based parts that utilize a rebranded chipset base that has not changed since 2010.

Four years ago AMD decided to change course entirely with their desktop and server CPUs. Instead of evolving the “Bulldozer” style architecture featuring CMT (Core Multi-Threading) they were going to do a clean sheet design that focused on efficiency, IPC, and scalability. While Bulldozer certainly could scale the thread count fairly effectively, the overall performance targets and clockspeeds needed to compete with Intel were just not feasible considering the challenges of process technology. AMD brought back Jim Keller to lead this effort, an industry veteran with a huge amount of experience across multiple architectures. Zen was born.

Hot Chips 28

This year’s Hot Chips is the first deep dive that we have received about the features of the Zen architecture. Mike Clark is taking us through all of the changes and advances that we can expect with the upcoming Zen products.

Zen is a clean sheet design that borrows very little from previous architectures. This is not to say that concepts that worked well in previous architectures were not revisited and optimized, but the overall floorplan has changed dramatically from what we have seen in the past. AMD did not stand still with their Bulldozer products, and the latest Excavator core does improve upon the power consumption and performance of the original. This evolution was simply not enough considering market pressures and Intel’s steady improvement of their core architecture year upon year. Zen was designed to significantly improve IPC and AMD claims that this product has a whopping 40% increase in IPC (instructions per clock) from the latest Excavator core.

AMD also has focused on scaling the Zen architecture from low power envelopes up to server level TDPs. The company looks to have pushed down the top end power envelope of Zen from the 125+ watts of Bulldozer/Vishera into the more acceptable 95 to 100 watt range. This also has allowed them to scale Zen down to the 15 to 25 watt TDP levels without sacrificing performance or overall efficiency. Most architectures have sweet spots where they tend to perform best. Vishera for example could scale nicely from 95 to 220 watts, but the design did not translate well into sub-65 watt envelopes. Excavator based “Carrizo” products on the other hand could scale from 15 watts to 65 watts without real problems, but became terribly inefficient above 65 watts with increased clockspeeds. Zen looks to address these differences by being able to scale from sub-25 watt TDPs up to 95 or 100. In theory this should allow AMD to simplify their product stack by offering a common architecture across multiple platforms.

Gunning for Broadwell-E

As I walked away from the St. Regis in downtown San Francisco tonight, I found myself wandering through the streets towards my hotel with something unique in tow. It was a smile. I was smiling, thinking about what AMD had just demonstrated and showed at its latest Zen processor reveal. The importance of this product launch can literally not be overstated for a company struggling to find a foothold to hang on to in a market that it once had a definitive lead. It’s been many years since I left a conference call, or a meeting, or a press conference feeling genuinely hopefully and enthusiastic about what AMD has shown me. Tonight I had that.

AMD’s CEO Lisa Su, and CTO Mark Papermaster, took stage down the street from the Intel Developer Forum to roll out a handful of new architectural details about the Zen architecture while also showing the first performance results comparing it to competing parts from Intel. The crowd in attendance, a mix of media and analysts, were impressed. The feeling was palpable in the room.

It’s late as I write this, and while there are some interesting architecture details to discuss, I think it is in everyone’s best interest that we touch on them lightly for now, and instead refocus on the deep-dive once the Hot Chips information comes out early next week. What you really want to know is clear: can Zen make Intel work again? Can Zen make that $1700 price tag on the Broadwell-E 6950X seem even more ludicrous? Yes.

The Zen Architecture

Much of what was discussed from the Zen architecture is a re-release of what has been out in recent months. This is a completely new, from the ground up, microarchitecture and not a revamp of the aging Bulldozer design. It integrated SMT (simultaneous multi-threading), a first for an AMD CPU, to better take efficient advantage of a longer pipeline. Intel has had HyperThreading for a long time now and AMD is finally joining the fold. A high bandwidth and low latency caching system is used to “feed the beast” as Papermaster put it and utilizing 14nm process technology (starting at Global Foundries) gives efficiency, and scaling a significant bump while enabling AMD to scale from notebooks to desktops to servers with the same architecture.

By far the most impressive claim from AMD thus far was that of a 40% increase in IPC over previous AMD designs. That’s a HUGE claim and is key to the success or failure of Zen. AMD proved to me today that the claims are real and that we will see the immediate impact of that architecture bump from day one.

Press was told of a handful of high level changes to the new architecture as well. Branch prediction gets a complete overhaul. This marks the first AMD processor to have a micro-op cache. Wider execution width with broader instruction schedulers are integrated, all of which adds up to much higher instruction level parallelism to improve single threaded performance.

Performance improvements aside, throughput and efficiency go up with Zen as well. AMD has integrated an 8MB L3 cache and improved prefetching for up 5x the cache bandwidth available per core on the CPU. SMT makes sure the pipeline stays full to prevent “bubbles” that introduce latency and lower efficiency while region-specific power gating means that we’ll see Zen in notebooks as well as enterprise servers in 2017. It truly is an impressive design from AMD.

Summit Ridge, the enthusiast platform that will be the first product available with Zen, is based on the AM4 platform and processors will go up to 8-cores and 16-threads. DDR4 memory support is included, PCI Express 3.0 and what AMD calls “next-gen” IO – I would expect a quick leap forward for AMD to catch up on things like NVMe and Thunderbolt.

The Real Deal – Zen Performance

As part of today’s reveal, AMD is showing the first true comparison between Zen and Intel processors. Sure, AMD showed a Zen-powered system running the upcoming Deus Ex running at 4K with a system powered by the Fury X, but the really impressive results where shown when comparing Zen to a Broadwell-E platform.

Using Blender to measure the performance of a rendering workload (a Zen CPU mockup of course), AMD ran an 8-core / 16-thread Zen processor at 3.0 GHz against an 8-core / 16-thread Broadwell-E processor at 3.0 GHz (likely a fixed clocked Core i7-6900K). The point of the demonstration was to showcase the IPC improvements of Zen and it worked: the render completed on the Zen platform a second or two faster than it did on the Intel Broadwell-E system.

Not much to look at, but Zen on the left, Broadwell-E on the right...

Of course there are lots of caveats: we didn’t setup the systems, I don’t know for sure that GPUs weren’t involved, we don’t know the final clocks of the Zen processors releasing in early 2017, etc. But I took two things away from the demonstration that are very important.

The IPC of Zen is on-par or better than Broadwell.

Zen will scale higher than 3.0 GHz in 8-core configurations.

AMD obviously didn’t state what specific SKUs were going to launch with the Zen architecture, what clock speeds they would run at, or even what TDPs they were targeting. Instead we were left with a vague but understandable remark of “comparable TDPs to Broadwell-E”.

Pricing? Overclocking? We’ll just have to wait a bit longer for that kind of information.

Closing Thoughts

There is clearly a lot more for AMD to share about Zen but the announcement and showcase made this week with the early prototype products have solidified for me the capability and promise of this new microarchitecture. We have asked for, and needed, as an industry, a competitor to Intel in the enthusiast CPU space – something we haven’t legitimately had since the Athlon X2 days. Zen is what we have been pining over, what gamers and consumers have needed.

AMD’s processor stars might finally be aligning for a product that combines performance, efficiency and scalability at the right time. I’m ready for it –are you?

Bristol Ridge Takes on Mobile: E2 Through FX

It is no secret that AMD has faced an uphill battle since the release of the original Core 2 processors from Intel. While stayed mostly competitive through the Phenom II years, they hit some major performance issues when moving to the Bulldozer architecture. While on paper the idea of Chip Multi-Threading sounded fantastic, AMD was never able to get the per thread performance up to expectations. While their CPUs performed well in heavily multi-threaded applications, they just were never seen in as positive of a light as the competing Intel products.

The other part of the performance equation that has hammered AMD is the lack of a new process node that would allow it to more adequately compete with Intel. When AMD was at 32 nm PD-SOI, Intel had introduced its 22nm TriGate/FinFET. AMD then transitioned to a 28nm HKMG planar process that was more size optimized than 32nm, but did not drastically improve upon power and transistor switching performance.

So AMD had a double whammy on their hands with an underperforming architecture and limitted to no access to advanced process nodes that would actually improve their power and speed situation. They could not force their foundry partners to spend billions on a crash course in FinFET technology to bring that to market faster, so they had to iterate and innovate on their designs.

Bristol Ridge is the fruit of that particular labor. It is also the end point to the architecture that was introduced with Bulldozer way back in 2011.

Broadwell-E Platform

It has been nearly two years since the release of the Haswell-E platform, which began with the launch of the Core i7-5960X processor. Back then, the introduction of an 8-core consumer processor was the primary selling point; along with the new X99 chipset and DDR4 memory support. At the time, I heralded the processor as “easily the fastest consumer processor we have ever had in our hands” and “nearly impossible to beat.” So what has changed over the course of 24 months?

Today Intel is launching Broadwell-E, the follow up to Haswell-E, and things look very much the same as they did before. There are definitely a couple of changes worth noting and discussing, including the move to a 10-core processor option as well as Turbo Boost Max Technology 3.0, which is significantly more interesting than its marketing name implies. Intel is sticking with the X99 platform (good for users that might want to upgrade), though the cost of these new processors is more than slightly disappointing based on trends elsewhere in the market.

This review of the new Core i7-6950X 10-core Broadwell-E processor is going to be quick, and to the point: what changes, what is the performance, how does it overclock, and what will it cost you?

10nm Sooner Than Expected?

It seems only yesterday that we had the first major GPU released on 16nm FF+ and now we are talking about ARM about to receive their first 10nm FF test chips! Well, in fact it was yesterday that NVIDIA formally released performance figures on the latest GeForce GTX 1080 which is based on TSMC’s 16nm FF+ process technology. Currently TSMC is going full bore on their latest process node and producing the fastest current graphics chip around. It has taken the foundry industry as a whole a lot longer to develop FinFET technology than expected, but now that they have that piece of the puzzle seemingly mastered they are moving to a new process node at an accelerated rate.

TSMC’s 10nm FF is not well understood by press and analysts yet, but we gather that it is more of a marketing term than a true drop to 10 nm features. Intel has yet to get past 14nm and does not expect 10 nm production until well into next year. TSMC is promising their version in the second half of 2016. We cannot assume that TSMC’s version will match what Intel will be doing in terms of geometries and electrical characteristics, but we do know that it is a step past TSMC’s 16nm FF products. Lithography will likely get a boost with triple patterning exposure. My guess is that the back end will also move away from the “20nm metal” stages that we see with 16nm. All in all, it should be an improved product from what we see with 16nm, but time will tell if it can match the performance and density of competing lines that bear the 10nm name from Intel, Samsung, and GLOBALFOUNDRIES.

ARM has a history of porting their architectures to new process nodes, but they are being a bit more aggressive here than we have seen in the past. It used to be that ARM would announce a new core or technology, and it would take up to two years to be introduced into the market. Now we are seeing technology announcements and actual products hitting the scenes about nine months later. With the mobile market continuing to grow we expect to see products quicker to market still.

The company designed a simplified test chip to tape out and send to TSMC for test production on the aforementioned 10nm FF process. The chip was taped out in December, 2015. The design was shipped to TSMC for mask production and wafer starts. ARM is expecting the finished wafers to arrive this month.

Lower Power, Same Performance

AMD is in a strange position in that there is a lot of excitement about their upcoming Zen architecture, but we are still many months away from that introduction. AMD obviously needs to keep the dollars flowing in, and part of that means that we get refreshes now and then of current products. The “Kaveri” products that have been powering the latest APUs from AMD have received one of those refreshes. AMD has done some redesigning of the chip and tweaked the process technology used to manufacture them. The resulting product is the “Godavari” refresh that offers slightly higher clockspeeds as well as better overall power efficiency as compared to the previous “Kaveri” products.

One of the first refreshes was the A8-7670K that hit the ground in November of 2015. This is a slightly cut down part that features 6 GPU compute units vs. the 8 that a fully enabled Godavari chip has. This continues to be a FM2+ based chip with a 95 watt TDP. The clockspeed of this part goes from 3.6 GHz to 3.9 GHz. The GPU portion runs at the same 757 MHz that the original A10-7850K ran at. It is interesting to note that it is still a 95 watt TDP part with essentially the same clockspeeds as the 7850K, but with two fewer GPU compute units.

The other product being covered here is a bit more interesting. The A10-7860K looks to be a larger improvement from the previous 7850K in terms of power and performance. It shares the same CPU clockspeed range as the 7850K (3.6 GHz to 3.9 GHz), but improves upon the GPU clockspeed by hitting around 800 MHz. At first this seems underwhelming until we realize that AMD has lowered the TDP from 95 watts down to 65 watts. Less power consumed and less heat produced for the same performance from the CPU side and improved performance from the GPU seems like a nice advance.

AMD continues to utilize GLOBALFOUNDRIES 28 nm Bulk/HKMG process for their latest APUs and will continue to do so until Zen is released late this year. This is not the same 28 nm process that we were introduced to over four years ago. Over that time improvements have been made to improve yields and bins, as well as optimize power and clockspeed. GF also can adjust the process on a per batch basis to improve certain aspects of a design (higher speed, more leakage, lower power, etc.). They cannot produce miracles though. Do not expect 22 nm FinFET performance or density with these latest AMD products. Those kinds of improvements will show up with Samsung/GF’s 14nm LPP and TSMC’s 16nm FF+ lines. While AMD will be introducing GPUs on 14nm LPP this summer, the Zen launch in late 2016 will be the first AMD CPU to utilize that advanced process.

Clockspeed Jump and More!

On March 1st AMD announced the availability of two new processors as well as more information on the A10 7860 APU.

The two new units are the A10-7890K and the Athlon X4 880K. These are both Kaveri based parts, but of course the Athlon has the GPU portion disabled. Product refreshes for the past several years have followed a far different schedule than the days of yore. Remember back in time when the Phenom II series and the competing Core 2 series would have clockspeed updates that were expected yearly, if not every half year with a slightly faster top end performer to garner top dollar from consumers?

Things have changed, for better or worse. We have so far seen two clockspeed bumps for the Kaveri /Godavari based APU. Kaveri was first introduced over two years ago with the A10-7850K and the lower end derivatives. The 7850K has a clockspeed that ranges from 3.7 GHz to the max 4 GHz with boost. The GPU portion is clocked at 720 MHz. This is a 95 watt TDP part that is one of the introductory units from GLOBALFOUNDRIES 28 nm HKMG process.

Today the new top end A10-7890K is clocked at 4.1 GHz to 4.3 GHz max. The GPU receives a significant boost in performance with a clockspeed of 866 MHz. The combination of CPU and GPU clockspeed increases push the total performance of the part exceeding 1 TFLOPs. It features the same dual module/quad core Godavari design as well as the 8 GCN Units. The interesting part here is that the APU does not exceed the 95 watt TDP that it shares with the older and slower 7850K. It is also a boost in performance from last year’s refresh of the A10-7870K which is clocked 200 MHz slower on the CPU portion but retains the 866 MHz speed of the GPU. This APU is fully unlocked so a user can easily overclock both the CPU and GPU cores.

The Athlon X4 880K is still based on the Godavari family rather than the Carizzo update that the X4 845 uses. This part is clocked from 4.0 to 4.2 GHz. It again retains the 95 watt TDP rating of the previous Athlon X4 CPUs. Previously the X4 860K was the highest clocked unit at 3.7 GHz to 4.0, but the 880K raises that to 4 to 4.2 GHz. A 300 MHz gain in base clock is pretty significant as well as stretching that ceiling to 4.2 GHz. The Godavari modules retain their full amount of L2 cache so the 880K has 4 MB available to it. These parts are very popular with budget enthusiasts and gaming builds as they are extremely inexpensive and perform at an acceptable level with free overclocking thrown in.

AMD Keeps Q1 Interesting

CES 2016 was not a watershed moment for AMD. They showed off their line of current video cards and, perhaps more importantly, showed off working Polaris silicon, which will be their workhorse for 2016 in the graphics department. They did not show off Zen, a next generation APU, or any AM4 motherboards. The CPU and APU world was not presented in a way that was revolutionary. What they did show off, however, hinted at the things to come to help keep AMD relevant in the desktop space.

It was odd to see an announcement about the stock cooler that AMD was introducing, but when we learned more about it, the more important it was for AMD’s reputation moving forward. The Wraith cooler is a new unit to help control the noise and temperatures of the latest AMD CPUs and select APUs. This is a fairly beefy unit with a large, slow moving fan that produces very little noise. This is a big change from the variable speed fans on previous coolers that could get rather noisy and leave temperatures that were higher in range than are comfortable. There has been some derision aimed at AMD for providing “just a cooler” for their top end products, but it is a push that is making them more user and enthusiast friendly without breaking the bank.

Socket AM3+ is not dead yet. Though we have been commenting on the health of the platform for some time, AMD and its partners work to improve and iterate upon these products to include technologies such as USB 3.1 and M.2 support. While these chipsets are limited to PCI-E 2.0 speeds, the four lanes available to most M.2 controllers allows these boards to provide enough bandwidth to fully utilize the latest NVMe based M.2 drives available. We likely will not see a faster refresh on AM3+, but we will see new SKUs utilizing the Wraith cooler as well as a price break for the processors that exist in this socket.

Are Computers Still Getting Faster?

It looks like CES is starting to wind down, which makes sense because it ended three days ago. Now that we're mostly caught up, I found a new video from The 8-Bit Guy. He doesn't really explain any old technologies in this one. Instead, he poses an open question about computer speed. He was able to have a functional computing experience on a ten-year-old Apple laptop, which made him wonder if the rate of computer advancement is slowing down.

I believe that he (and his guest hosts) made great points, but also missed a few important ones.

One of his main arguments is that software seems to have slowed down relative to hardware. I don't believe that is true, but I believe it's looking in the right area. PCs these days are more than capable of doing just about anything in terms of 2D user interface that we would want to, and do so with a lot of overhead for inefficient platforms and sub-optimal programming (relative to the 80's and 90's at the very least). The areas that require extra horsepower are usually doing large batches of many related tasks. GPUs are key in this area, and they are keeping up as fast as they can, despite some stagnation with fabrication processes and a difficulty (at least before HBM takes hold) in keeping up with memory bandwidth.

For the last five years to ten years or so, CPUs have been evolving toward efficiency as GPUs are being adopted for the tasks that need to scale up. I'm guessing that AMD, when they designed the Bulldozer architecture, hoped that GPUs would have been adopted much more aggressively, but even as graphics devices, they now have a huge effect on Web, UI, and media applications.

These are also tasks that can scale well between devices by lowering resolution (and so forth). The primary thing that a main CPU thread needs to do is figure out the system's state and keep the graphics card fed before the frame-train leaves the station. In my experience, that doesn't scale well (although you can sometimes reduce the amount of tracked objects for games and so forth). Moreover, it is easier to add GPU performance, compared to single-threaded CPU, because increasing frequency and single-threaded IPC should be more complicated than planning out more, duplicated blocks of shaders. These factors combine to give lower-end hardware a similar experience in the most noticeable areas.

So, up to this point, we discussed:

Software is often scaling in ways that are GPU (and RAM) limited.

CPUs are scaling down in power more than up in performance.

GPU-limited tasks can often be approximated with smaller workloads.

Software gets heavier, but it doesn't need to be "all the way up" (ex: resolution).

Some latencies are hard to notice anyway.

Back to the Original Question

This is where “Are computers still getting faster?” can be open to interpretation.

Tasks are diverging from one class of processor into two, and both have separate industries, each with their own, multiple goals. As stated, CPUs are mostly progressing in power efficiency, which extends (an assumed to be) sufficient amount of performance downward to multiple types of devices. GPUs are definitely getting faster, but they can't do everything. At the same time, RAM is plentiful but its contribution to performance can be approximated with paging unused chunks to the hard disk or, more recently on Windows, compressing them in-place. Newer computers with extra RAM won't help as long as any single task only uses a manageable amount of it -- unless it's seen from a viewpoint that cares about multi-tasking.

In short, computers are still progressing, but the paths are now forked and winding.

May the Radeon be with You

In celebration of the release of The Force Awakens as well as the new Star Wars Battlefront game from DICE and EA, AMD sent over some hardware for us to use in a system build, targeted at getting users up and running in Battlefront with impressive quality and performance, but still on a reasonable budget. Pairing up an AMD processor, MSI motherboard, Sapphire GPU with a low cost chassis, SSD and more, the combined system includes a FreeSync monitor for around $1,200.

Holiday breaks are MADE for Star Wars Battlefront

Though the holiday is already here and you'd be hard pressed to build this system in time for it, I have a feeling that quite a few of our readers and viewers will find themselves with some cash and gift certificates in hand, just ITCHING for a place to invest in a new gaming PC.

The video above includes a list of components, the build process (in brief) and shows us getting our gaming on with Star Wars Battlefront. Interested in building a system similar the one above on your own? Here's the hardware breakdown.

For under $1,000, plus another $250 or so for the AOC FreeSync capable 1080p monitor, you can have a complete gaming rig for your winter break. Let's detail some of the specific components.

AMD sent over the FX-8370 processor for our build, a 4-module / 8-core CPU that runs at 4.0 GHz, more than capable of handling any gaming work load you can toss at it. And if you need to do some transcoding, video work or, heaven forbid, school or productivity work, the FX-8370 has you covered there too.

For the motherboard AMD sent over the MSI 990FXA Gaming board, one of the newer AMD platforms that includes support for USB 3.1 so you'll have a good length of usability for future expansion. The Cooler Master Hyper 212 EVO cooler was our selection to keep the FX-8370 running smoothly and 8GB of AMD Radeon DDR3-2133 memory is enough for the system to keep applications and the Windows 10 operating system happy.

Skylake Architecture Comes Through

When Intel finally revealed the details surrounding it's latest Skylake architecture design back in August at IDF, we learned for the first time about a new technology called Intel Speed Shift. A feature that moves some of the control of CPU clock speed and ramp up away from the operating system and into hardware gives more control to the processor itself, making it less dependent on Windows (and presumably in the future, other operating systems). This allows the clock speed of a Skylake processor to get higher, faster, allowing for better user responsiveness.

It's pretty clear that Intel is targeting this feature addition for tablets and 2-in-1s where the finger/pen to screen interaction is highly reliant on immediate performance to enable improved user experiences. It has long been known that one of the biggest performance deltas between iOS from Apple and Android from Google centers on the ability for the machine to FEEL faster when doing direct interaction, regardless of how fast the background rendering of an application or web browser actually is. Intel has been on a quest to fix this problem for Android for some time, where it has the ability to influence software development, and now they are bringing that emphasis to Windows 10.

With the most recent Windows 10 update, to build v10586, Intel Speed Shift has finally been enabled for Skylake users. And since you cannot disable the feature once it's installed, this is the one and only time we'll be able to measure performance in our test systems. So let's see if Intel's claims of improved user experiences stand up to our scrutiny.

That is a lotta SKUs!

The slow, gradual release of information about Intel's Skylake-based product portfolio continues forward. We have already tested and benchmarked the desktop variant flagship Core i7-6700K processor and also have a better understanding of the microarchitectural changes the new design brings forth. But today Intel's 6th Generation Core processors get a major reveal, with all the mobile and desktop CPU variants from 4.5 watts up to 91 watts, getting detailed specifications. Not only that, but it also marks the first day that vendors can announce and begin selling Skylake-based notebooks and systems!

All indications are that vendors like Dell, Lenovo and ASUS are still some weeks away from having any product available, but expect to see your feeds and favorite tech sites flooded with new product announcements. And of course with a new Apple event coming up soon...there should be Skylake in the new MacBooks this month.

Since I have already talked about the architecture and the performance changes from Haswell/Broadwell to Skylake in our 6700K story, today's release is just a bucket of specifications and information surround 46 different 6th Generation Skylake processors.

Intel's 6th Generation Core Processors

At Intel's Developer Forum in August, the media learned quite a bit about the new 6th Generation Core processor family including Intel's stance on how Skylake changes the mobile landscape.

Skylake is being broken up into 4 different line of Intel processors: S-series for desktop DIY users, H-series for mobile gaming machines, U-series for your everyday Ultrabooks and all-in-ones, Y-series for tablets and 2-in-1 detachables. (Side note: Intel does not reference an "Ultrabook" anymore. Huh.)

As you would expect, Intel has some impressive gains to claim with the new 6th Generation processor. However, it is important to put them in context. All of the claims above, including 2.5x performance, 30x graphics improvement and 3x longer battery life, are comparing Skylake-based products to CPUs from 5 years ago. Specifically, Intel is comparing the new Core i5-6200U (a 15 watt part) against the Core i5-520UM (an 18 watt part) from mid-2010.

A third primary processor

As the Hot Chips conference begins in Cupertino this week, Qualcomm is set to divulge another set of information about the upcoming Snapdragon 820 processor. Earlier this month the company revealed details about the Adreno 5xx GPU architecture, showcasing improved performance and power efficiency while also adding a new Spectra 14-bit image processor. Today we shift to what Qualcomm calls the “third pillar in the triumvirate of programmable processors” that make up the Snapdragon SoC. The Hexagon DSP (digital signal processor), introduced initially by Qualcomm in 2004, has gone through a massive architecture shift and even programmability shift over the last 10 years.

Qualcomm believes that building a balanced SoC for mobile applications is all about heterogeneous computing with no one processor carrying the entire load. The majority of the work that any modern Snapdragon processor must handle goes through the primary CPU cores, the GPU or the DSP. We learned about upgrades to the Adreno 5xx series for the Snapdragon 820 and we are promised information about Kryo CPU architecture soon as well. But the Hexagon 600-series of DSPs actually deals with some of the most important functionality for smartphones and tablets: audio, voice, imaging and video.

Interestingly, Qualcomm opened up the DSP to programmability just four years ago, giving developers the ability to write custom code and software to take advantages of the specific performance capabilities that the DSP offers. Custom photography, videography and sound applications could benefit greatly in terms of performance and power efficiency if utilizing the QC DSP rather than the primary system CPU or GPU. As of this writing, Qualcomm claims there are “hundreds” of developers actively writing code targeting its family of Hexagon processors.

The Hexagon DSP in Snapdragon 820 consists of three primary partitions. The main compute DSP works in conjunction with the GPU and CPU cores and will do much of the heavy lifting for encompassed workloads. The modem DSP aids the cellular modem in communication throughput. The new guy here is the lower power DSP in the Low Power Island (LPI) that shifts how always-on sensors can communicate with the operating system.

Core and Interconnect

The Skylake architecture is Intel’s first to get a full release on the desktop in more than two years. While that might not seem like a long time in the grand scheme of technology, for our readers and viewers that is a noticeable change and shift from recent history that Intel has created with the tick-tock model of releases. Yes, Broadwell was released last year and was solid product, but Intel focused almost exclusively on the mobile platforms (notebooks and tablets) with it. Skylake will be much more ubiquitous and much more quickly than even Haswell.

Skylake represents Intel’s most scalable architecture to date. I don’t mean only frequency scaling, though that is an important part of this design, but rather in terms of market segment scaling. Thanks to brilliant engineering and design from Intel’s Israeli group Intel will be launching Skylake designs ranging from 4.5 watt TDP Core M solutions all the way up to the 91 watt desktop processors that we have already reviewed in the Core i7-6700K. That’s a range that we really haven’t seen before and in the past Intel has depended on the Atom architecture to make up ground on the lowest power platforms. While I don’t know for sure if Atom is finally trending towards the dodo once Skylake’s reign is fully implemented, it does make me wonder how much life is left there.

Scalability also refers to the package size – something that ensures that the designs the engineers created can actually be built and run in the platform segments they are targeting. Starting with the desktop designs for LGA platforms (DIY market) that fits on a 1400 mm2 design on the 91 watt TDP implementation Intel is scaling all the way down to 330 mm2 in a BGA1515 package for the 4.5 watt TDP designs. Only with a total product size like that can you hope to get Skylake in a form factor like the Compute Stick – which is exactly what Intel is doing. And note that the smaller packages require the inclusion of the platform IO chip as well, something that H- and S-series CPUs can depend on the motherboard to integrate.

Finally, scalability will also include performance scaling. Clearly the 4.5 watt part will not offer the user the same performance with the same goals as the 91 watt Core i7-6700K. The screen resolution, attached accessories and target applications allow Intel to be selective about how much power they require for each series of Skylake CPUs.

Core Microarchitecture

The fundamental design theory in Skylake is very similar to what exists today in Broadwell and Haswell with a handful of significant and hundreds of minor change that make Skylake a large step ahead of previous designs.

This slide from Julius Mandelblat, Intel Senior Principle Engineer, shows a higher level overview of the entirety of the consumer integration of Skylake. You can see that Intel’s goals included a bigger and wider core design, higher frequency, improved right architecture and fabric design and more options for eDRAM integration. Readers of PC Perspective will already know that Skylake supports both DDR3L and DDR4 memory technologies but the inclusion of the camera ISP is new information for us.

Benchmark Overview

I knew that the move to DirectX 12 was going to be a big shift for the industry. Since the introduction of the AMD Mantle API along with the Hawaii GPU architecture we have been inundated with game developers and hardware vendors talking about the potential benefits of lower level APIs, which give more direct access to GPU hardware and enable more flexible threading for CPUs to game developers and game engines. The results, we were told, would mean that your current hardware would be able to take you further and future games and applications would be able to fundamentally change how they are built to enhance gaming experiences tremendously.

I knew that the reader interest in DX12 was outstripping my expectations when I did a live blog of the official DX12 unveil by Microsoft at GDC. In a format that consisted simply of my text commentary and photos of the slides that were being shown (no video at all), we had more than 25,000 live readers that stayed engaged the whole time. Comments and questions flew into the event – more than me or my staff could possible handle in real time. It turned out that gamers were indeed very much interested in what DirectX 12 might offer them with the release of Windows 10.

Today we are taking a look at the first real world gaming benchmark that utilized DX12. Back in March I was able to do some early testing with an API-specific test that evaluates the overhead implications of DX12, DX11 and even AMD Mantle from Futuremark and 3DMark. This first look at DX12 was interesting and painted an amazing picture about the potential benefits of the new API from Microsoft, but it wasn’t built on a real game engine. In our Ashes of the Singularity benchmark testing today, we finally get an early look at what a real implementation of DX12 looks like.

And as you might expect, not only are the results interesting, but there is a significant amount of created controversy about what those results actually tell us. AMD has one story, NVIDIA another and Stardock and the Nitrous engine developers, yet another. It’s all incredibly intriguing.

It comes after 8, but before 10

As the week of Intel’s Developer Forum (IDF) begins, you can expect to see a lot of information about Intel’s 6th Generation Core architecture, codenamed Skylake, finally revealed. When I posted my review of the Core i7-6700K, the first product based on that architecture to be released in any capacity, I was surprised that Intel was willing to ship product without the normal amount of background information for media and developers. Rather than give us the details and then ship product, which has happened for essentially every consumer product release I have been a part of, Intel did the reverse: ship a consumer friendly CPU and then promise to tell us how it all works later in the month at IDF.

Today I came across a document posted on Intel’s website that dives into very specific detail on the new Gen9 graphics and compute architecture of Skylake. Details on the Core architecture changes are not present, and instead we are given details on how the traditional GPU portion of the SoC has changed. To be clear: I haven’t had any formal briefing from Intel on this topic or anything surrounding the architecture of Skylake or the new Gen9 graphics system but I wanted to share the details we found available. I am sure we’ll learn more this week as IDF progresses so I will update this story where necessary.

What Intel calls Processor Graphics is what we used to call simply integrated graphics for the longest time. The purpose and role of processor graphics has changed drastically over the years and it is now not only responsible for 3D graphics rendering but compute, media and display capabilities of the Intel Skylake SoC (when discrete add-in graphics is not used). The architecture document used to source this story focuses on Gen9 graphics, the compute architecture utilized in the latest Skylake CPUs. The Intel HD Graphics 530 on the Core i7-6700K / Core i5-6600K is the first product released and announced using Gen9 graphics and is also the first to adopt Intel’s new 3-digit naming scheme.

This die shot of the Core i7-6700K shows the increased size and prominence of the Gen9 graphics in the overall SoC design. Containing four traditional x86 CPU cores and 1 “slice” implementation of Gen9 graphics (with three visible sub-slices we’ll describe below), this is not likely to be the highest performing iteration of the latest Intel HD Graphics technology.

Like the Intel processors before it, the Skylake design utilizes a ring bus architecture to connect the different components of the SoC. This bi-directional interconnect has a 32-byte wide data bus and connects to multiple “agents” on the CPU. Each individual CPU core is considered its own agent while the Gen9 compute architecture is considered one complete agent. The system agent bundles the DRAM memory, the display controller, PCI Express and other I/O interface that communicate with the rest of the PC. Any off-chip memory requests and transactions occur through this bus while on-chip data transfers tend to be handled differently.

The Intel Skylake architecture has been on our radar for quite a long time as Intel's next big step in CPU design. Through leaks and some official information discussed by Intel over the past few months, we know at least a handful of details: DDR4 memory support, 14nm process technology, modest IPC gains and impressive GPU improvements. But the details have remained a mystery on how the "tock" of Skylake on the 14nm process technology will differ from Broadwell and Haswell.

Interestingly, due to some shifts in how Intel is releasing Skylake, we are going to be doing a review today with very little information on the Skylake architecture and design (at least officially). While we are very used to the company releasing new information at the Intel Developer Forum along with the launch of a new product, Intel has instead decided to time the release of the first Skylake products with Gamescom in Cologne, Germany. Parts will go on sale today (August 5th) and we are reviewing a new Intel processor without the background knowledge and details that will be needed to really explain any of the changes or differences in performance that we see. It's an odd move honestly, but it has some great repercussions for the enthusiasts that read PC Perspective: Skylake will launch first as an enthusiast-class product for gamers and DIY builders.

For many of you this won't change anything. If you are curious about the performance of the new Core i7-6700K, power consumption, clock for clock IPC improvements and anything else that is measurable, then you'll get exactly what you want from today's article. If you are a gear-head that is looking for more granular details on how the inner-workings of Skylake function, you'll have to wait a couple of weeks longer - Intel plans to release that information on August 18th during IDF.

So what does the addition of DDR4 memory, full range base clock manipulation and a 4.0 GHz base clock on a brand new 14nm architecture mean for users of current Intel or AMD platforms? Also, is it FINALLY time for users of the Core i7-2600K or older systems to push that upgrade button? (Let's hope so!)