Tag Archives: performance

Unless you opt for one of the very low-power CPUs on the market today, chances are the processor inside your desktop PC has a fan attached to keep it cool. The higher performance the chip and the more you overclock, the more cooling required. And more cooling inevitably means more noise unless you go with a water cooling solution.

Specialist cooling company Noctua has teamed up with RotoSub to come up with a low noise solution that allows you to stick with air cooling, but removes the noise. They’ve done this by adding active noise cancellation to one of their CPU coolers for the very first time. A project that’s been ongoing for over a year.

The cooler is still in prototype form, but was on display at Computex 2013. It could actually be referred to as a cooling cube looking at the design, but is based on Noctua’s NH-D14 cooler using a twin-tower heatsink, which consists of two heatsink blocks and a fan mounted between them.

The noise cancellation is achieved through a combination of mic and speakers. The mic listens to the sound created by the cooler, then the speakers output the same sound with a phase difference. In so doing, destructive interference is created and the level of noise is cut significantly.

According to those able to listen to the prototype CPU cooler, there was actually no noise heard when the system was active. If you watch the demonstration video above you can clearly hear the difference the noise cancellation makes.

Such a system is sure to be popular and no doubt expensive. Based on the design, it should be possible to develop versions that work with all motherboards and CPUs, meaning Noctua could certainly have a hit on its hands here.

3DMark for Android: Performance Preview

As I mentioned in our coverage of GL/DXBenchmark 2.7, with the arrival of Windows RT/8 we’d finally see our first truly cross-platform benchmarks. Kishonti was first out of the gate, although Futuremark was first to announce its cross platform benchmark simply called 3DMark.

Currently available for x86 Windows 8 machines, Futuremark has Android, iOS and Windows RT versions of 3DMark nearing release. Today the embargo lifts on the Android version of 3DMark, with iOS and Windows RT to follow shortly.

Similar to the situation with GL/DXBenchmark, 3DMark not only spans OSes but APIs as well. The Windows RT/8 versions use DirectX, while the Android and iOS versions use OpenGL ES 2.0. Of the three major tests in the new 3DMark, only Ice Storm is truly cross platform. Ice Storm uses OpenGL ES 2.0 on Android/iOS and Direct3D feature level 9_1 on Windows RT/8.

The Android UI is very functional and retains a very 3DMark feel. There’s an integrated results brower, history of results and some light device information as well:

There are two options for running Ice Storm: the default and extreme presets.

3DMark – Ice Storm Settings

Default

Extreme

Rendering Resolution

1280×720

1920×1080

Texture Resolution

Normal

High

Post-processing Quality

Normal

High

Both benchmarks are rendered to an offscreen buffer at 720p/1080p and then scale up to the native resolution of the device being tested. This is a very similar approach we’ve seen by game developers to avoid rendering at native resolution on some of the ultra high resolution tablets. The beauty of 3DMark’s approach here is the fact that all results are comparable, regardless of a device’s native resolution. The downside is we don’t get a good idea of how some of the ultra high resolution tablets would behave with these workloads running at their native (> 1080p) resolutions.

Ice Storm is divided into two graphics tests and a physics test. The first graphics test is geometry heavy while the second test is more pixel shader intensive. The physics test, as you might guess, is CPU bound and multithreaded.

Before we get to the results, I should note that a number of devices wouldn’t complete the tests. The Intel based Motorola RAZR i wouldn’t run, the AT&T HTC One X (MSM8960) crashed before the final score was calculated so both of those devices were excluded. Thankfully we got the Galaxy S 3 to complete, giving us a good representative from the MSM8960/Adreno 225 camp. Thermal throttling is a concern when running 3DMark. You have to pay close attention to the thermal conditions of the device you’re testing. This is becoming something we’re having to pay an increasing amount of attention to in our reviews these days.

Graphics Test 1

Ice Storm Graphics test 1 stresses the hardware’s ability to process lots of vertices while keeping the pixel load relatively light. Hardware on this level may have dedicated capacity for separate vertex and pixel processing. Stressing both capacities individually reveals the hardware’s limitations in both aspects.

In an average frame, 530,000 vertices are processed leading to 180,000 triangles rasterized either to the shadow map or to the screen. At the same time, 4.7 million pixels are processed per frame.

Although the first graphics test is heavy on geometry, it features roughly 1/4 the number of vertices from GL/DXBenchmark 2.7’s T-Rex HD test. In terms of vertex/triangle count, even Egypt HD is more stressful than 3DMark’s first graphics test. That’s not necessarily a bad thing however, as most Android titles are no where near as stressful as what T-Rex and Egypt HD simulate.

Among Android smartphones, Qualcomm rules the roost here. The Adreno 320 based Nexus 4 and HTC One both do very well, approaching 60 fps in the first graphics test. The Mali 400MP4, used in the Galaxy Note 2 and without a lot of vertex processing power, brings up the rear – being outperformed by even NVIDIA’s Tegra 3. ARM’s Mali-T604 isn’t enough to pull ahead in this test either; the Nexus 10 remains squarely behind the top two Adreno 320 based devices.

Graphics Test 2

Graphics test 2 stresses the hardware’s ability to process lots of pixels. It tests the ability to read textures, do per pixel computations and write to render targets.

On average, 12.6 million pixels are processed per frame. The additional pixel processing compared to Graphics test 1 comes from including particles and post processing effects such as bloom, streaks and motion blur.

In each frame, an average 75,000 vertices are processed. This number is considerably lower than in Graphics test 1 because shadows are not drawn and the processed geometry has a lower number of polygons.

As you’d expect, shifting to a more pixel shader heavy workload shows the Galaxy Note 2 doing a lot better – effectively tying the Tegra 3 based HTC One X+ and outperforming the Nexus 7. The Mali-T604 continues to, at best, tie for third place here. Qualcomm’s Adreno 320 just seems to deliver better performance in 3DMark for Android.

Physics Test

The purpose of the Physics test is to benchmark the hardware’s ability to do gameplay physics simulations on CPU. The GPU load is kept as low as possible to ensure that only the CPU’s capabilities are stressed.

The test has four simulated worlds. Each world has two soft bodies and two rigid bodies colliding with each other. One thread per available logical CPU core is used to run simulations. All physics are computed on the CPU with soft body vertex data updated to the GPU each frame. The background is drawn as a static image for the least possible GPU load.

The Physics test uses the Bullet Open Source Physics Library.

The physics results give us an indication of just how heavily threaded this benchmark is. The quad-core devices are able to outperform the dual-core Cortex A15 based Nexus 10, despite the latter having far better single threaded performance. The Droid DNA/Optimus G vs. Nexus 4 results continue to be a bit odd, perhaps due to the newer drivers included in the Nexus 4’s use of Android 4.2 vs. 4.1.2 for the other APQ8064 platforms.

A recent trip got us access to an early sample of Intel’s upcoming Core i7-4770K. We compare its performance to Ivy Bridge- and Sandy Bridge-based processors, so you have some idea what to expect when Intel officially introduces its Haswell architecture.

We recently got our hands on a Core i7-4770K, based on Intel’s Haswell micro-architecture. It’s not final silicon, but compared to earlier steppings (and earlier drivers), we’re comfortable enough about the way this chip performs to preview it against the Ivy and Sandy Bridge designs.

Presentations at last year’s Developer Forum in San Francisco taught us as much as there is to know about the Haswell architecture itself. But as we get closer to the official launch, more details become known about how Haswell will materialize into actual products. Fortunately for us, some of the first CPUs based on Intel’s newest design will be aimed at enthusiasts.

Fourth-Generation Intel Core Desktop Line-Up

Cores / Threads

TDP (W)

Clock Rate

1 Core

2 Cores

3 Cores

4 Cores

L3

GPU

Max. GPU Clock

TSX

i7-4770K

4 / 8

84

3.5 GHz

3.9 GHz

3.9 GHz

3.8 GHz

3.7 GHz

8 MB

GT2

1.25 GHz

No

i7-4770

4 / 8

84

3.4 GHz

3.9 GHz

3.9 GHz

3.8 GHz

3.7 GHz

8 MB

GT2

1.2 GHz

Yes

i5-4670K

4 / 4

84

3.4 GHz

3.8 GHz

3.8 GHz

3.7 GHz

3.6 GHz

6 MB

GT2

1.2 GHz

No

i5-4670

4 /4

84

3.4 GHz

3.8 GHz

3.8 GHz

3.7 GHz

3.6 GHz

6 MB

GT2

1.2 GHz

Yes

i5-4570

4 / 4

84

3.2 GHz

3.6 GHz

3.6 GHz

3.5 GHz

3.4 GHz

6 MB

GT2

1.15GHz

Yes

i5-4430

4 / 4

84

3 GHz

3.2 GHz

3.2 GHz

3.1 GHz

3 GHz

6 MB

GT2

1.1 GHz

No

i7-4770S

4 / 4

65

3.1 GHz

3.9 GHz

3.8 GHz

3.6 GHz

3.5 GHz

8 MB

GT2

1.2 GHz

Yes

i5-4570S

4 / 4

65

2.9 GHz

3.6 GHz

3.5 GHz

3.3 GHz

3.2 GHz

6 MB

GT2

1.15GHz

Yes

i5-4670S

4 / 4

65

3.1 GHz

3.8 GHz

3.7 GHz

3.5 GHz

3.4 GHz

6 MB

GT2

1.2 GHz

Yes

i5-4430S

4 / 4

65

2.7 GHz

3.2 GHz

3.1 GHz

2.9 GHz

2.8 GHz

6 MB

GT2

1.1 GHz

No

i7-4770T

4 / 4

45

2.5 GHz

3.7 GHz

3.6 GHz

3.4 GHz

3.1 GHz

8 MB

GT2

1.2 GHz

Yes

i5-4670T

4 / 4

45

2.3 GHz

3.3 GHz

3.2 GHz

3 GHz

2.9 GHz

6 MB

GT2

1.2 GHz

Yes

i7-4765T

4 / 4

35

2 GHz

3 GHz

2.9 GHz

2.7 GHz

2.6 GHz

8 MB

GT2

1.2 GHz

Yes

i5-4570T

2 / 4

35

2.9 GHz

3.6 GHz

3.3 GHz

–

–

4 MB

GT2

1.15 GHz

Yes

According to Intel’s current plans, you’ll find dual- and quad-core LGA 1150 models with the GT2 graphics configuration sporting 20 execution units. There will also be dual- and quad-core socketed rPGA-based models for the mobile space, featuring the same graphics setup. Everything in the table above is LGA 1150, though. All of those models share support for two channels of DDR3-1600 at 1.5 V and 800 MHz minimum core frequencies. They also share a 16-lane PCI Express 3.0 controller, AVX2 support, and AES-NI support. Interestingly, four of the listed models do not support Intel’s new Transactional Synchronization Extensions (TSX). We’re not sure why Intel would want to differentiate its products with a feature intended to handle locking more efficiently, but that appears to be what it’s doing.

The much-anticipated GT3 graphics engine, with 40 EUs, is limited to BGA-based applications, meaning it won’t be upgradeable. Intel will have quad-core with GT3, quad-core with GT2, and dual-core with GT2 versions in ball grid array packaging. GT3 will also make an appearance in a BGA-based multi-chip package that includes a Lynx Point chipset. That’ll be a dual-core part, though.

In addition to the processors Intel plans to launch here in a few months, we’ll also be introduced to the 8-series Platform Controller Hubs, currently code-named Lynx Point. The most feature-complete version of Lynx Point will incorporate six SATA 6Gb/s ports, 14 total USB ports (six of which are USB 3.0), eight lanes of second-gen PCIe, and VGA output.

Eight-series chipsets are going to be physically smaller than their predecessors (23×22 millimeters on the desktop, rather than 27×27) with lower pin-counts. This is largely attributable to more capabilities integrated on the CPU itself. Previously, eight Flexible Display Interface lanes connected the processor and PCH. Although the processor die hosted an embedded DisplayPort controller, the VGA, LVDS, digital display interfaces, and audio were all down on the chipset. Now, the three digital ports are up in the processor, along with the audio and embedded DisplayPort. LVDS is gone altogether, as are six of the FDI lanes.

Although Dhrystone isn’t necessarily applicable to real-world performance, a lack of software already-optimized for AVX2 means we need to go to SiSoftware’s diagnostic for an idea of how Haswell’s support for the instruction set might affect general integer performance in properly-optimized software.

The Whetstone module employs SSE3, so Haswell’s improvements over Ivy Bridge are far more incremental.

Sandra’s Multimedia benchmark generates a 640×480 image of the Mandelbrot Set fractal using 255 iterations for each pixel, representing vectorised code that runs as close to perfectly parallel as possible.

The integer test employs the AVX2 instruction set on Intel’s Haswell-based Core i7-4770K, while the Ivy andSandy Bridge-based processors are limited to AVX support. As you see in the red bar, the task is finished much faster on Haswell. It’s close, but not quite 2x.

Floating-point performance also enjoys a significant speed-up from Intel’s first implementation of FMA3 (AMD’s Bulldozer design supports FMA4, while Piledriver supports both the three- and four-operand versions). The Ivy and Sandy Bridge-based processors utilize AVX-optimized code paths, falling quite a bit behind at the same clock rate.

Why do doubles seem to speed up so much more than floats on Haswell? The code path for FMA3 is actually latency-bound. If we were to turn off FMA3 support altogether in Sandra’s options and used AVX, the scaling proves similar.

All three of these chips feature AES-NI support, and we know from past reviews that because Sandra runs entirely in hardware, our platforms are processing instructions as fast as they’re sent from memory. The Core i7-4770K’s slight disadvantage in our AES256 test is indicative of slightly less throughput—something I’m comfortable chalking up to the early status of our test system.

Meanwhile, SHA2-256 performance is all about each core’s compute performance. So, the IPC improvements that go into Haswell help propel it ahead of Ivy Bridge, which is in turn faster than Sandy Bridge.

The memory bandwidth module confirms our findings in the Cryptography benchmark. All three platforms are running 1,600 MT/s data rates; the Haswell-based machine just looks like it needs a little tuning.

We already know that Intel optimized Haswell’s memory hierarchy for performance, based on information discussed at last year’s IDF. As expected, Sandra’s cache bandwidth test shows an almost-doubling of performance from the 32 KB L1 data cache.

Gains from the L2 cache are actually a lot lower than we’d expect though; we thought that number would be close to 2x as well, given 64 bytes/cycle throughput (theoretically, the L2 should be capable of more than 900 GB/s). The L3 cache actually drops back a bit, which could be related to its separate clock domain.

It still isn’t clear whether something’s up with our engineering sample CPU, or if there’s still work to be done on the testing side. Either way, this is a pre-production chip, so we aren’t jumping to any conclusions.